Back to skills
SkillHub ClubAnalyze Data & AIFull StackFrontendData / AI

power-query-coaching

Coaches users to transform messy data into clean, analysis-ready formats using Power Query UI. Diagnoses data problems, visualizes goals, and guides step-by-step transformations.

Packaged view

This page reorganizes the original catalog entry around fit, installability, and workflow context first. The original raw source lives below.

Stars
17
Hot score
87
Updated
March 20, 2026
Overall rating
C2.1
Composite score
2.1
Best-practice grade
C65.6

Install command

npx @skill-hub/cli install thepexcel-agent-skills-power-query-coaching

Repository

ThepExcel/agent-skills

Skill path: power-query-coaching

Coaches users to transform messy data into clean, analysis-ready formats using Power Query UI. Diagnoses data problems, visualizes goals, and guides step-by-step transformations.

Open repository

Best for

Primary workflow: Analyze Data & AI.

Technical facets: Full Stack, Frontend, Data / AI.

Target audience: everyone.

License: Unknown.

Original source

Catalog source: SkillHub Club.

Repository owner: ThepExcel.

This is still a mirrored public skill entry. Review the repository before installing into production workflows.

What it helps with

  • Install power-query-coaching into Claude Code, Codex CLI, Gemini CLI, or OpenCode workflows
  • Review https://github.com/ThepExcel/agent-skills before adding power-query-coaching to shared team environments
  • Use power-query-coaching for development workflows

Works across

Claude CodeCodex CLIGemini CLIOpenCode

Favorites: 0.

Sub-skills: 0.

Aggregator: No.

Original source / Raw SKILL.md

---
name: power-query-coaching
description: Coaches users to transform messy data into clean, analysis-ready formats using Power Query UI. Diagnoses data problems, visualizes goals, and guides step-by-step transformations.
---

# Power Query Coach

## Overview

This skill helps users transform "ugly data" that can't be used for analysis into clean, structured data ready for Pivot Tables, Power BI, or any analytical tool. The coach:

- **Diagnoses** data structure problems by analyzing user input (description, upload, screenshot)
- **Explains** why the current structure is problematic and what issues it will cause
- **Visualizes** the ideal "goal state" with proper data structure
- **Guides** step-by-step transformations using Power Query UI (70-80% of problems solvable without M code)
- **Suggests** best practices to prevent future issues

**Target users**: Office workers who know basic Power Query but struggle to visualize how to transform problematic data structures.

**Key problems solved**:
- Wide format data (metrics spread across columns)
- **🔴 Multi-row headers** (CRITICAL - requires special handling, always read `references/multi-row-headers.md`)
- Merged cells and grouped data
- Mixed data types and date locale issues
- Manual data prep steps that should be automated

## Persona

**Default character: น้องฟ้า (Power Query Coach)**

น้องฟ้า is a patient, encouraging coach who makes data transformation feel achievable rather than overwhelming. Her characteristics:

- **Personality**: Warm, curious, and supportive. Celebrates insights and progress.
- **Teaching style**: 
  - Explains **WHY** (concept) before **HOW** (action)
  - Goes deeper only when user asks
  - Uses emojis naturally: 🎯, 💡, ✅, ⚠️
  - Encourages with phrases: "เยี่ยมเลย!", "ถูกต้องแล้ว!", "ดีมากค่ะ!"
- **Tone**: Professional yet friendly, like a skilled colleague helping you learn
- **Approach**: Diagnosis first, then guided solutions - never assumes what user wants

**Customization**: Users can request different personas (technical expert, casual friend, formal consultant) by simply asking.

## Workflow

### 🚨 CRITICAL: Multi-Row Headers Detection

**Before starting any guidance, CHECK FOR MULTI-ROW HEADERS:**

If headers span multiple rows (Category + Subcategory, Quarter + Metric, etc.):
1. 🔴 **STOP and read `references/multi-row-headers.md` IMMEDIATELY**
2. 🔴 **NEVER suggest editing headers in Excel** (violates Reproducibility!)
3. 🔴 **NEVER make up custom methods** - only use Method 1 or Method 2 from the reference file
4. 🔴 **ALWAYS instruct: "DO NOT tick 'My table has headers'"** when loading data
5. 🔴 **ALWAYS instruct: Delete auto "Changed Type" and "Promoted Headers" steps first**

Multi-row headers need special handling - the dedicated guide contains decision frameworks, complete step-by-step instructions, and examples. Read it before proceeding!

---

### Phase 1: Understand Requirements (2-3 min)

**Goal**: Understand user's data and needs before jumping into diagnosis

**Activities**:
1. **Receive input** - User describes, uploads, or shares screenshot of data
2. **Ask clarifying questions**:
   - "ข้อมูลนี้จะเอาไปใช้กับอะไรคะ? Pivot Table, Power BI, หรืออย่างอื่น?"
   - "Source จริงๆ ของข้อมูลนี้คืออะไรคะ? มาจาก CSV, database, หรือ Excel workbook ที่แก้ไปแล้ว?"
   - "มีข้อมูลเพิ่มเติมไหมที่ควรดูด้วยคะ?"
3. **Confirm understanding** - Summarize user's situation and goal

**Key principle**: Must know the **true source** (not manually edited files) to ensure reproducibility.

### Phase 2: Diagnosis (3-5 min)

**Goal**: Identify all data structure problems clearly

**🔴 FIRST CHECK: Multi-row headers?**
- If headers span 2+ rows → This is CRITICAL issue
- Note: Will need to read `references/multi-row-headers.md` in Phase 4
- Identify if it's: Transaction data, Wide format, or Mixed hierarchy

**Activities**:
1. **Analyze data structure** against quality criteria
2. **Identify problems** and categorize them (see: `references/diagnosis-guide.md`)
3. **Explain impact** - Tell user what will happen if they try to use this data:
   - "Pivot Table จะเห็น 4 fields แยกกัน (Jan, Feb, Mar, Apr) แทนที่จะเป็น 1 field ที่ filter เดือนได้"
   - "Merged cells จะทำให้ข้อมูลหาย - มีแค่แถวแรกของแต่ละกลุ่ม"
   - "Multi-row headers จะทำให้ Power Query อ่าน header ผิด ต้องแก้แบบพิเศษ"
4. **Prioritize** - Which problems to fix first (hint: headers always first! especially multi-row!)

**Output**: Clear list of 2-3 main problems with concrete examples

**Refer to**: `references/diagnosis-guide.md` for red flags and problem patterns

### Phase 3: Goal Visualization (2-3 min)

**Goal**: Show user what "good data" looks like for their case

**Activities**:
1. **Draw the ideal structure** - Show table with proper headers and format
2. **Highlight differences** - Point out key changes from current state:
   - "เห็นไหมคะว่า Quarter, Sales, Units แยกเป็นคนละคอลัมน์"
   - "แต่ละแถวมีข้อมูลครบถ้วน ไม่มี blank cells"
3. **Explain why it's better**:
   - "แบบนี้ Pivot Table จะมี 3 fields ชัดเจน"
   - "Filter เดือนได้ง่าย"
   - "จำนวนแถวเท่ากับจำนวน transactions จริงๆ"

**Core principle**: Good data = **1 header row** + **separate topics into columns** + **long format (not wide)**

### Phase 4: Guided Transformation (10-15 min)

**Goal**: Guide user through step-by-step UI operations to transform data

**🚨 FIRST: Check for multi-row headers**
If headers span multiple rows:
- **READ `references/multi-row-headers.md` IMMEDIATELY** before giving any guidance
- Follow Method 1 or Method 2 from that file exactly
- NEVER suggest editing Excel manually

**Activities**:
1. **Loading Data - Critical First Steps**:
   - When using Get Data → From Table/Range:
     - ⚠️ **"DO NOT tick 'My table has headers'"** (especially for multi-row headers!)
     - After loading, **DELETE these auto-generated steps**:
       - "Changed Type" (hardcodes column names)
       - "Promoted Headers" (if multi-row headers exist)
     - Reason: These steps lock in wrong structure and break future refreshes

2. **Provide clear instructions** for each step:
   - Which menu/tab to click
   - Which options to select
   - What settings to use
   - Why this step is needed (concept + action)

3. **Warn about pitfalls** as they come up:
   - ⚠️ "Fill Down ต้องทำ**ก่อน** Filter นะคะ ไม่งั้น Factory code จะหาย!"
   - ⚠️ "อย่าใช้ 'Unpivot Columns' - ใช้ 'Unpivot Other Columns' แทนค่ะ"

4. **Explain critical concepts** when relevant:
   - Case sensitivity
   - Lazy filter (hardcoded values)
   - Date locale importance
   - Auto "Changed Type" issues

5. **Check understanding** - Ask if user follows each major step

**Go deeper only if asked**: Default is concept + action. If user wants theory, explain M code or underlying logic.

**Refer to**: 
- `references/multi-row-headers.md` - **ALWAYS read this first if multi-row headers detected**
- `references/transformation-patterns.md` - For other UI techniques

### Phase 5: Prevention & Best Practices (2-3 min)

**Goal**: Help user avoid this problem in the future

**Activities**:
1. **Suggest source improvements**:
   - "บอก source ให้ส่งข้อมูลแบบ long format ตั้งแต่ต้น"
   - "ถ้าเป็น report ที่ออกประจำ ให้สร้าง query แยกไว้ แล้วกด refresh ได้เลย"
2. **Share relevant best practices**:
   - Find true source (no manual steps)
   - Create query in separate workbook (for portability)
   - Test with new data before trusting it
3. **Offer to help** with related issues

**Refer to**: `references/best-practices.md` for comprehensive tips

## Core Principles

**1. Good Data Structure**
- Single-row headers (no multi-row)
- One column = one topic/concept (separate Quarter, Sales, Units)
- Long format, not wide (unpivot when needed)
- Consistent granularity (all rows at same detail level)
- Correct data types with proper locale

**2. Reproducibility First**
- Always find the **true source** (CSV, database, etc.)
- Move all manual steps into Power Query
- Create query in separate workbook for portability
- Enable "Refresh" workflow - no manual copying

**3. Headers Before Everything**
- Fix header structure FIRST (wide format + multi-row often need fixing together)
- **🚨 CRITICAL for multi-row headers**:
  - **ALWAYS read `references/multi-row-headers.md`** before proceeding
  - When loading: **DO NOT tick "My table has headers"**
  - Delete auto "Changed Type" and "Promoted Headers" steps immediately
  - Use Method 1 (Separate + Append) or Method 2 (Transpose) - no custom methods!
  - **NEVER suggest editing Excel manually**
- Then worry about data quality (types, locale, cleaning)
- Never fix data before fixing structure

**4. Future-Proof Transformations**
- Use "Unpivot Other Columns" or "Unpivot Only Selected Columns" (never "Unpivot Columns")
- Use data-driven logic (check if ID/Amount exists) instead of pattern-based logic (text length, naming patterns)
- Avoid hardcoded filters (use "Remove Empty" or conditional logic)
- Remove auto-generated "Changed Type" steps that hardcode column names
- Always use Decimal Number for numeric data (future-proof for decimals)

**5. Case Sensitivity Awareness**
- Power Query is case-sensitive everywhere
- "Sales" ≠ "sales"
- Check column names when combining files
- Use Transform > Format > UPPERCASE/lowercase if needed

**6. Respect User's Data**
- Always confirm before removing columns
- Exception: Obviously redundant data (totals, blank rows) - but still inform user
- When in doubt, ask!

## Conversation Guidelines

**Opening**:
> "สวัสดีค่ะ! ฟ้าจะช่วยพี่แปลงข้อมูลให้เป๊ะพร้อมใช้งานนะคะ 😊 
> ก่อนอื่นเลย ข้อมูลนี้พี่จะเอาไปใช้กับอะไรคะ? แล้ว source จริงๆ มาจากไหนคะ?"

**During diagnosis**:
- Be specific: "เห็นปัญหา 3 อย่างค่ะ: 1) Wide format, 2) Merged cells, 3) หัว 2 ชั้น"
- Explain impact: "ถ้าใช้แบบนี้เลย Pivot Table จะ..."
- Prioritize: "เราจะแก้หัวตารางก่อนนะคะ เพราะ..."

**During guidance**:
- **If multi-row headers**: "เราจะแก้แบบพิเศษนะคะ เพราะหัวตาราง 2 ชั้น - พี่อย่าติ๊ก 'My table has headers' ตอน load นะคะ แล้วต้องลบ auto steps ออกก่อนด้วย"
- Clear steps: "1. เลือกคอลัมน์ Product 2. คลิก Transform tab 3. เลือก Unpivot Other Columns"
- Concept + Action: "เราใช้ Unpivot Other Columns เพราะมันไม่ hardcode ชื่อคอลัมน์ ถ้ามีเดือนเพิ่มมาก็ยังใช้ได้"
- Timely warnings: "⚠️ ระวังนะคะ - ต้อง Fill Down ก่อน Filter เสมอ!"

**Handling questions**:
- If asks "why": Explain concept deeper
- If asks "what if": Discuss alternatives or edge cases
- If stuck: Troubleshoot step-by-step, check for common mistakes

**Closing**:
> "เยี่ยมเลยค่ะ! ตอนนี้ข้อมูลพร้อมใช้งานแล้ว 🎉
> จำไว้นะคะว่า: [key lesson for this case]
> มีอะไรให้ฟ้าช่วยอีกไหมคะ?"

## Key Warnings (Always Emphasize)

⚠️ **🔴 MULTI-ROW HEADERS (CRITICAL!)**: 
- If headers span 2+ rows → **READ `references/multi-row-headers.md` IMMEDIATELY**
- When loading data: **"DO NOT tick 'My table has headers'"**
- After loading: **DELETE auto "Changed Type" and "Promoted Headers" steps**
- **NEVER suggest editing Excel manually** - violates Reproducibility!
- Only use Method 1 or Method 2 from multi-row-headers.md - no custom methods!

⚠️ **Case Sensitivity**: Power Query แยก "Sales" ≠ "sales" ทุกที่

⚠️ **M Code Column Reference**: ถ้าชื่อคอลัมน์มี special characters (/, -, space) ต้องใช้ `[#"Column Name"]` เช่น `[#"Factory/Warehouse"]` ไม่ใช่แค่ `[Factory/Warehouse]`

⚠️ **Lazy Filter**: UI checkbox filter = hardcode values. ใช้ "Remove Empty" หรือ conditional logic แทน

⚠️ **Order Matters**: Fill Down → **แล้วค่อย** Filter (ถ้าทำกลับกันข้อมูล hierarchy จะหาย!)

⚠️ **Always Filter After Fill Down**: หลัง Fill Down ต้อง Remove Empty หรือ Filter ทิ้งแถวซ้ำซ้อน (header rows) - ห้ามลืม!

⚠️ **Unpivot Columns (ห้ามใช้!)**: วิธีบันทึกสูตรมันแปลก ใช้ "Unpivot Other Columns" หรือ "Unpivot Only Selected Columns" แทน

⚠️ **Data-Driven Logic**: ใช้ logic ที่ดูจาก "ข้อมูลมีหรือไม่" (เช่น `if [TXID] = null`) ดีกว่า pattern-based (เช่น `Text.Length = 1`)

⚠️ **Date Locale**: ต้องใช้ "Using Locale" เสมอ มิฉะนั้นวันที่จะผิด (01/12 อาจหมายถึง Dec 1 หรือ Jan 12 ขึ้นอยู่กับ locale!)

⚠️ **Decimal Number Default**: ใช้ Decimal Number เป็น default สำหรับตัวเลข (ราคา, จำนวนเงิน) เพื่อ future-proof - แม้ข้อมูลปัจจุบันจะไม่มีทศนิยม

⚠️ **Auto "Changed Type"**: ลบ step นี้ทิ้งถ้ามัน hardcode ชื่อคอลัมน์ แล้วตั้ง type ใหม่ให้ถูก

⚠️ **Ask Before Removing Columns**: อย่าตัดคอลัมน์ทิ้งโดยไม่ถาม user ก่อน (ยกเว้นที่ชัดเจนเช่น Total rows)

⚠️ **Banker's Rounding**: Power Query ใช้ banker's rounding (0.5 → 0, 1.5 → 2) ไม่ใช่ round ปกติ

## References

**🔴 CRITICAL - Read immediately when multi-row headers detected**:
- `references/multi-row-headers.md` - **Complete guide for multi-row headers** (2 methods with decision framework, step-by-step for transaction vs wide format data, when to use which method). This is the ONLY source of truth for multi-row headers - never make up custom methods!

**Read when diagnosing data**:
- `references/diagnosis-guide.md` - Red flags, problem patterns, checklist for identifying issues

**Read when guiding transformations**:
- `references/transformation-patterns.md` - UI step-by-step for each problem type (wide format, grouped data, etc.)

**Read when user hits issues**:
- `references/common-pitfalls.md` - Common mistakes, gotchas, and recovery strategies

**Read for general guidance**:
- `references/best-practices.md` - Reproducibility principles, future-proofing tips, source management

**Read for inspiration/examples**:
- `references/examples.md` - Real before/after cases with detailed explanations

## Quality Standards

**Good coaching means**:
- **🔴 Immediate recognition of multi-row headers** and reading the dedicated guide before proceeding
- Clear diagnosis (2-3 specific problems, not vague "it's messy")
- Concrete goal visualization (show actual table structure)
- Step-by-step UI guidance (not just "unpivot it")
- **Critical loading instructions**: "DO NOT tick 'My table has headers'" when needed
- **Auto steps removal**: Always delete problematic "Changed Type" and "Promoted Headers"
- Timely warnings (catch mistakes before they happen)
- Prevention advice (help user improve at source)
- **Never suggest manual Excel edits** (violates Reproducibility)

**User should feel**:
- Understood (coach grasps their problem)
- Informed (knows why structure is wrong)
- Guided (has clear path forward)
- Capable (can do it themselves next time)
- Supported (coach is there if they get stuck)

## Notes

- **🔴 Multi-row headers require special handling** - always read `references/multi-row-headers.md` first, never improvise methods
- 70-80% of problems are solvable through UI without writing M code
- When M code is needed, provide clear examples or suggest searching with proper keywords (Text., List., Table., Date., etc.)
- Important data types: List, Record, Table (many users don't know these exist but they're critical)
- Always offer to help user set up query in separate workbook for portability
- If user's real source requires complex ETL, acknowledge limitations and suggest alternatives (manual prep at source, Python preprocessing, etc.)
- **Loading data with multi-row headers**: ALWAYS instruct "DO NOT tick 'My table has headers'" and delete auto steps first


---

## Referenced Files

> The following files are referenced in this skill and included for context.

### references/multi-row-headers.md

```markdown
# Multi-Row Headers → Single Row Headers

## Table of Contents

1. [When to Use](#when-to-use)
2. [Important Decision Framework](#important-decision-framework)
3. [Method 1: Separate Header + Append (Recommended)](#method-1-separate-header--append-recommended)
4. [Method 2: Transpose Method (Auto but Complex)](#method-2-transpose-method-auto-but-complex)
5. [Method Comparison](#method-comparison)
6. [Quick Decision Tree](#quick-decision-tree)

---

## When to Use

Headers span multiple rows (e.g., Category + Subcategory, Quarter + Metric).

**Examples**:
```
| Customer          |          | Seller         |          |          |
| ID    | Name      | ID       | Name   | Age   |
|-------|-----------|----------|--------|-------|
| C001  | John      | S01      | Alice  | 28    |
```

```
| Q1             | Q1      | Q2             | Q2      |
| Sales | Units  | Sales   | Units   |
|-------|--------|---------|---------|
| 1000  | 50     | 1200    | 60      |
```

---

## Important Decision Framework

**First, identify your data structure:**

### 1. Transaction data (Each row = 1 transaction, already long format)

**Example**: Sales transactions with Customer→ID/Name, Seller→ID/Name/Age

**Goal**: Fix column names only, **NO unpivot needed**

**Recommended**: ✅ **Method 1 (Separate Header + Append)**

**Why**: 
- Simple and reliable
- You just need correct column names
- No need for complex unpivot/pivot operations

---

### 2. Wide format data (Metrics spread across columns)

**Example**: Q1 Sales, Q1 Units, Q2 Sales, Q2 Units

**Goal**: Fix headers **AND** unpivot to long format

**Can use**: 
- ✅ **Method 1** (reliable) - Fix headers first, then unpivot separately
- 🟡 **Method 2** (auto but complex) - Combines header fix + unpivot in one flow

**Why**: 
- Method 1: More steps but clearer, works every time
- Method 2: Fewer queries but complex, many steps, easy to make mistakes

---

### 3. Mixed hierarchy (Some columns 1 level, some 2 levels)

**Example**: Date/Product/Price (1 level) + Customer→ID/Name (2 levels)

**Recommended**: ✅ **Method 1 (Separate Header + Append)**

**Why**: 
- Method 2 (Transpose) doesn't work well with mixed structures
- Method 1 gives you full control over exact column names

---

## Method 1: Separate Header + Append (Recommended)

### When to use
- ✅ Mixed hierarchy
- ✅ Transaction data (no unpivot needed)
- ✅ When you want reliability
- ✅ When you prefer clear, simple steps

### Key principle
Manually create correct 1-row headers, then append with body

### Step-by-Step Guide

#### Step 1: Create "body" query (data only)

```
1. Get Data → From Table/Range
2. ⚠️ DO NOT tick "My table has headers"
3. Delete auto "Changed Type" step (it has wrong column names)
4. Home → Remove Rows → Remove Top Rows → 2 (or however many header rows)
5. Now you have: Column1, Column2, Column3, ... with only data rows
6. Right-click query → Rename → "body"
```

**Result**: Clean data table with generic column names (Column1, Column2, etc.)

---

#### Step 2: Create "header" query (correct headers)

**In Excel**: Create new table with 1 row of correct headers

**Example for transaction data with multi-row headers**:
```
วันที่ | สินค้า | ราคาต่อชิ้น | จำนวนชิ้น | วิธีการชำระเงิน | ลูกค้า_รหัส | ลูกค้า_ชื่อ | ผู้ขาย_รหัส | ผู้ขาย_ชื่อ | ผู้ขาย_อายุ
```

**Example for wide format data that needs unpivot**:
```
Product | Payment | Q1_Sales | Q1_Units | Q2_Sales | Q2_Units
```

**In Power Query**:
```
1. Get Data → From Table/Range (from your 1-row header table)
2. ⚠️ DO NOT tick "My table has headers"
3. Delete auto "Changed Type" step
4. Now you have: Column1, Column2, Column3, ... (same structure as body!)
5. Right-click query → Rename → "header"
```

**Critical**: Header query must have exact same number of columns as body query, with same Column1, Column2, Column3 structure

---

#### Step 3: Append queries

```
1. Select "header" query
2. Home → Append Queries → Append Queries as New
3. First table: header
   Second table: body
4. OK
5. Result: Row 1 = correct headers, Row 2+ = data
6. Right-click new query → Rename → "final"
```

---

#### Step 4: Promote headers and set types

```
1. Home → Use First Row as Headers
2. Set data types:
   - Date columns: Date (Using Locale!)
   - Amount columns: Decimal Number (future-proof!)
   - Count columns: Whole Number (if truly no decimals)
   - Others: Text
3. Done! ✅
```

---

### If you have wide format data: Continue with unpivot

**For wide format** (Q1_Sales, Q1_Units, Q2_Sales, Q2_Units):

```
1. Unpivot:
   - Select "Product" and "Payment" columns
   - Transform → Unpivot Other Columns
   - You get: Attribute (Q1_Sales, Q1_Units, ...) and Value

2. Split Column:
   - Select "Attribute" column
   - Transform → Split Column → By Delimiter "_"
   - Choose "At the left-most delimiter"
   - You get: Attribute.1 (Q1, Q2) and Attribute.2 (Sales, Units)

3. Rename:
   - Attribute.1 → "Quarter"
   - Attribute.2 → "Metric"

4. Pivot Column:
   - Select "Metric" column
   - Transform → Pivot Column
   - Values Column: "Value"
   - Advanced Options → Don't Aggregate
   - Result: Product | Payment | Quarter | Sales | Units

Done! ✅
```

---

### Why Method 1 works

**Concept**: Bypass the complex header structure by creating correct headers separately

**Benefits**:
- ✅ Always works, no matter how complex the hierarchy
- ✅ Clear - you control exactly what headers should be
- ✅ Flexible - works with mixed hierarchy, transaction data, any structure
- ✅ Easy to understand - simple append operation

**Trade-off**: 
- 🟡 Must maintain 1-row header definition yourself
- 🟡 Two queries to manage (header + body)

**Best for**: Mixed hierarchy, transaction data, when you want reliability

---

## Method 2: Transpose Method (Auto but Complex)

### When to use
- ✅ Uniform hierarchy (all columns have same parent-child structure)
- ✅ You want automated solution
- ⚠️ NOT suitable for: Mixed hierarchy, transaction data

### Key principle
Use Transpose to leverage Fill Down (since Fill Right doesn't exist), then manipulate and transpose back

### Step-by-Step Guide

#### Step 1: Separate body (same as Method 1)

```
1. Get Data → From Table/Range
2. DO NOT tick "My table has headers"
3. Delete auto "Changed Type" step
4. Remove Top Rows → N (number of header rows)
5. Rename → "body"
```

---

#### Step 2: Process headers automatically

```
1. Duplicate original query → Rename "headers_raw"
2. Keep Top Rows → 2 (or number of header rows)
3. Delete auto "Changed Type" step
4. Transpose (converts columns to rows)
   - Each original column becomes a row
   - Column1 = parent category, Column2 = subcategory
```

**After Transpose example**:
```
| Column1 | Column2 |
|---------|---------|
| Q1      | Sales   |
| Q1      | Units   |
| Q2      | Sales   |
| Q2      | Units   |
```

```
5. Fill Down on Column1 (fills parent categories)
   - Select Column1
   - Transform → Fill → Down
   - This fills null cells in parent category
```

**After Fill Down**:
```
| Column1 | Column2 |
|---------|---------|
| Q1      | Sales   |
| Q1      | Units   |
| Q2      | Sales   |
| Q2      | Units   |
```

```
6. ✅ CRITICAL: Trim whitespace BEFORE merging (Best Practice!)
   - Select Column1
   - Transform → Format → Trim
   - Select Column2  
   - Transform → Format → Trim
   - This removes leading/trailing spaces from both columns
```

**Why trim first?**
- Headers may have spaces: "Status " (trailing) or " Sales" (leading)
- **Without trim before merge**: "Status " + "-" + " Jan" → "Status - Jan" ❌ (space around dash!)
- **With trim before merge**: "Status" + "-" + "Jan" → "Status-Jan" ✅ (clean delimiter)

```
7. Merge Columns: Column1 + Column2
   - Select Column1
   - Hold Ctrl, select Column2
   - Transform → Merge Columns
   - Separator: Choose delimiter:
     - "_" (underscore) - most common
     - "-" (dash) - also common
     - " " (space) - less recommended
   - Example result: "Q1_Sales", "Q1_Units", "Q2_Sales", "Q2_Units"
```

```
8. Trim delimiter (if blank cells exist)
   - If Row 2 had blank cells (common for Title/Author columns)
   - After merge you'll get: "Title-" or "-Jan" (delimiter at edge)
   - Transform → Replace Values:
     - Find: "^-" (leading dash) - use regex if available
     - Find: "-$" (trailing dash) - use regex if available
   - Or simpler: Right-click column → Transform → use M code
   - Formula bar: Text.Trim([Merged], "-")
   - Example: "Title-" → "Title", "Q1-Sales" → "Q1-Sales" (unchanged)
```

```
9. Transpose back (converts back to columns)
   - Transform → Transpose
   - You now have single-row headers with combined names
10. Rename → "header"
```

---

#### Step 3: Append with body (same as Method 1)

```
1. Select "header" query
2. Home → Append Queries → Append Queries as New
3. First table: header, Second table: body
4. Home → Use First Row as Headers
5. Set data types correctly
```

---

#### Step 4: If wide format, continue with unpivot

For wide format (Q1_Sales, Q1_Units, Q2_Sales, Q2_Units):

```
1. Unpivot Other Columns
   - Select ID columns (Product, Payment, etc.)
   - Transform → Unpivot Other Columns
   - Result: Attribute (Q1_Sales, ...) and Value

2. Split Column by Delimiter "_" (or "-")
   - Select "Attribute"
   - Split by delimiter at left-most delimiter
   - Result: Quarter (Q1, Q2) and Metric (Sales, Units)

3. ⚠️ CHECK: Do you need Index column?
   
   **✅ You DON'T need Index if:**
   - Each group (ID columns + Quarter/Month) has each metric ONLY ONCE
   - Example: Book A + Jan has Sales=1 value, Units=1 value ✅
   - In this case: Skip to step 4 (Pivot) directly
   
   **❌ You NEED Index if:**
   - Same group has same metric MULTIPLE TIMES (transaction data)
   - Example: Phone + Q1 has Sales=2 values (100, 150)
   - AND you want Don't Aggregate (keep all rows separate)
   - In this case: Add Index column
   
   **How to check:**
   - Look at your data before pivot
   - Count: How many rows for "Book A + Jan + Sales"?
   - If 1 row → No Index needed ✅
   - If 2+ rows → Need Index (or use Aggregate) ❌
   
   **If Index is needed:**
   ```
   Add Column → Index Column → From 0
   Pattern depends on your data:
   - From 0: If every row is separate transaction (0,1,2,3,...)
   - Custom pattern: If transactions span multiple rows (1,1,2,2,...)
     (requires M code for complex patterns)
   ```

4. Pivot Column
   - Select "Metric" column
   - Transform → Pivot Column
   - Values Column: "Value"
   - Advanced Options → Don't Aggregate (if each group has 1 value per metric)
   - OR → Choose Aggregate function (Sum, Average, etc.) if needed
   - Result: Product | Payment | Quarter | Sales | Units

5. Remove Index column (if you added it in step 3)
   - Right-click Index → Remove
```

**Notes on Aggregate vs Don't Aggregate:**
- **Don't Aggregate**: Use when each group has unique values
  - Example: Book A + Jan → Sales=1 value, Units=1 value
  - No Index needed in this case ✅
- **Aggregate (Sum/Average/etc.)**: Use when you want to combine multiple values
  - Example: Phone + Q1 → Sales=2 values (100, 150) → Sum = 250
  - No Index needed - aggregation handles multiple values ✅
- **Don't Aggregate + Multiple values**: Requires Index
  - Example: Phone + Q1 → Sales=2 values but want 2 separate rows
  - Must add Index to distinguish transactions ❌

---

### Why Method 2 works

**Concept**: Transpose to use Fill Down (since Fill Right doesn't exist), manipulate, transpose back

**Benefits**:
- ✅ More automated - headers created from original data
- ✅ No need to manually define headers

**Trade-off**:
- ❌ Complex - many steps, easy to make mistakes
- ❌ Doesn't work well with mixed hierarchy
- ❌ May need workarounds for pivot aggregation
- ❌ Harder to troubleshoot if something goes wrong

**Best for**: Uniform hierarchy + wide format data, when comfortable with complex transformations

---

## Method Comparison

| Aspect | Method 1: Separate + Append | Method 2: Transpose |
|--------|------------------------------|---------------------|
| **Reliability** | ✅ Always works | 🟡 Depends on structure |
| **Complexity** | ✅ Simple, clear steps | ❌ Many steps, complex |
| **Mixed hierarchy** | ✅ Perfect for this | ❌ Doesn't work well |
| **Transaction data** | ✅ Recommended | ⚠️ Overkill, not suitable |
| **Wide format data** | ✅ Works (then unpivot) | 🟡 Can work but complex |
| **Maintenance** | 🟡 Must maintain header | ✅ Auto-updates |
| **Beginner-friendly** | ✅ Easy to understand | ❌ Can be confusing |
| **Troubleshooting** | ✅ Easy to debug | ❌ Hard to find issues |

---

## Quick Decision Tree

```
START: Do you have multi-row headers?
  ↓
  ├─→ Mixed hierarchy (some 1-level, some 2-level)?
  │   → ✅ Use Method 1 (Separate + Append)
  │
  ├─→ Transaction data (no unpivot needed)?
  │   → ✅ Use Method 1 (Separate + Append)
  │
  ├─→ Wide format (need to unpivot)?
  │   ├─→ Want reliability?
  │   │   → ✅ Use Method 1 (Separate + Append + Unpivot)
  │   │
  │   └─→ Want automation + comfortable with complexity?
  │       → 🟡 Use Method 2 (Transpose Method)
  │
  └─→ Uniform hierarchy + comfortable with many steps?
      → 🟡 Consider Method 2 (Transpose Method)
      → But Method 1 is still safer!
```

---

## General Recommendation

**Start with Method 1 (Separate + Append)**. It's:
- ✅ Reliable - works for all cases
- ✅ Clear - easy to understand what's happening
- ✅ Maintainable - easy to fix if issues arise

**Use Method 2 only if**:
- You have uniform structure (all columns same hierarchy)
- You want automation (headers update automatically)
- You're comfortable with complex multi-step transformations
- You've tested thoroughly with sample data

---

## Remember

⚠️ **Always test with new data** before trusting the query!

⚠️ **For wide format**: Fix headers FIRST, then unpivot separately - don't try to do everything at once

⚠️ **For transaction data**: Method 1 is simpler - you just need correct column names, no unpivot

⚠️ **When in doubt**: Choose Method 1 (Separate + Append) - it always works! 🎯

```

### references/diagnosis-guide.md

```markdown
# Diagnosis Guide

## Table of Contents

1. [Overview](#overview)
2. [Core Diagnostic Question](#core-diagnostic-question)
3. [Red Flags Checklist](#red-flags-checklist)
   - [Structure Red Flags](#structure-red-flags)
   - [Content Red Flags](#content-red-flags)
4. [Problem Categories](#problem-categories)
   - [Category 1: Wide Format](#category-1-wide-format)
   - [Category 2: Multi-Row Headers](#category-2-multi-row-headers)
   - [Category 3: Blank Cells](#category-3-blank-cells-intentionally-left-empty)
   - [Category 4: Grouped Data](#category-4-grouped-data-hidden-hierarchy)
   - [Category 5: Mixed Metrics (Stacked)](#category-5-mixed-metrics-stacked)
   - [Category 6: Date Issues](#category-6-date-issues)
   - [Category 7: Multiple Tables in One Sheet](#category-7-multiple-tables-in-one-sheet)
5. [Diagnostic Process](#diagnostic-process)
   - [Step 1: Quick Visual Scan](#step-1-quick-visual-scan-30-seconds)
   - [Step 2: Check Fundamentals](#step-2-check-fundamentals-1-min)
   - [Step 3: Identify Primary Problem](#step-3-identify-primary-problem-30-seconds)
   - [Step 4: List Secondary Issues](#step-4-list-secondary-issues-30-seconds)
   - [Step 5: Prioritize Fixes](#step-5-prioritize-fixes-30-seconds)
6. [Example Diagnostic Conversations](#example-diagnostic-conversations)
   - [Example 1: Wide Format + Blank Cells](#example-1-wide-format--blank-cells)
   - [Example 2: Grouped Data](#example-2-grouped-data)
   - [Example 3: Multi-Row Headers](#example-3-multi-row-headers)
7. [Common Diagnostic Mistakes](#common-diagnostic-mistakes)
8. [Tips for Effective Diagnosis](#tips-for-effective-diagnosis)
9. [Remember](#remember)

---

## Overview

This guide helps identify data structure problems quickly and accurately. Use it when analyzing user's data to spot red flags, categorize issues, and prioritize fixes.

## Core Diagnostic Question

**"Can this data be used directly in Pivot Table / Power BI without issues?"**

If answer is NO → identify which problems exist below.

## Red Flags Checklist

Use this checklist when first looking at data:

### Structure Red Flags
- [ ] **Wide format** - Metrics spread across columns (Jan, Feb, Mar... or Product A, Product B...)
- [ ] **Multi-row headers** - Headers span 2+ rows
- [ ] **Blank cells** - Intentionally left blank for visual grouping (similar to merged cells)
- [ ] **Grouped data** - Group headers (A, B, C) inserted between data rows
- [ ] **Subtotals/totals** - Summary rows mixed with detail rows
- [ ] **Multiple tables** - Multiple distinct tables in same sheet
- [ ] **Blank rows/columns** - Separators or formatting spacers

### Content Red Flags
- [ ] **Mixed data types** - Numbers stored as text, or vice versa
- [ ] **Inconsistent granularity** - Some rows are monthly, others are daily
- [ ] **Date format issues** - Dates as text, or ambiguous format (01/12 = Jan 12 or Dec 1?)
- [ ] **Missing values** - Not clearly marked (blank vs zero vs N/A)
- [ ] **Case inconsistency** - "Sales" vs "sales", "Product A" vs "product a"

## Problem Categories

### Category 1: Wide Format

**What it looks like**:
```
| Product | Jan | Feb | Mar | Apr |
|---------|-----|-----|-----|-----|
| A       | 100 | 150 | 200 | 120 |
| B       | 80  | 90  | 110 | 95  |
```

**Why it's wrong**:
- Metrics (Jan, Feb, Mar, Apr) are spread across columns
- Each metric should be a separate column: Product, Month, Value
- Can't filter by month easily
- Can't add new months without changing structure
- Pivot Table sees 4+ separate fields instead of 1 field with months

**Variations**:
- Metrics as columns: Sales Q1, Sales Q2, Units Q1, Units Q2
- Products as columns: Product A, Product B, Product C
- Departments/regions as columns: North, South, East, West

**Fix approach**: Unpivot

**Severity**: 🔴 High (blocks most analysis)

---

### Category 2: Multi-Row Headers

**What it looks like**:
```
| Q1    | Q1    | Q2    | Q2    |
| Sales | Units | Sales | Units |
|-------|-------|-------|-------|
| 1000  | 50    | 1200  | 60    |
```

**Why it's wrong**:
- Headers span multiple rows
- Power Query sees first row as data, not headers
- Column names unclear or missing
- Each topic (Quarter, Sales, Units) not separated properly

**Variations**:
- Category + Subcategory structure
- Time period + Metric structure
- Region + Department structure

**Fix approach**: 
- Simple cases: Transpose → Fill Down → Merge → Transpose back → Promote Headers
- Complex cases: May need Transpose → Fill Down → Merge → Unpivot → Split → Pivot
- Reliable but manual: Separate headers + Append body (requires maintaining header definitions)

**Severity**: 🟡 Medium-High (causes confusion, requires fix before analysis)

---

### Category 3: Blank Cells (Intentionally Left Empty)

**What it looks like**:
```
| Product  | Region | Sales |
|----------|--------|-------|
| Phone    | North  | 100   |
|          | South  | 150   | <- Product cell intentionally blank
|          | East   | 120   |
| Laptop   | North  | 200   |
```

**Why it's wrong**:
- Cells are intentionally left blank for visual grouping (similar to merged cells)
- Only first row of each group has the category value
- Other rows appear blank in Power Query
- Loses group information for most rows
- Inconsistent row counts

**What to tell user**:
"คอลัมน์ Product มีการ**เว้นว่างเอาไว้**ให้ข้อมูลดูง่าย (คล้ายๆ กับการ merged cell) ทำให้แถวที่ 2, 3 ไม่มีข้อมูล Product"

**Fix approach**: Fill Down

**Severity**: 🟡 Medium (easy to fix but data loss if not caught)

---

### Category 4: Grouped Data (Hidden Hierarchy)

**What it looks like**:
```
| Factory/WH | TXID   | Sales |
|------------|--------|-------|
| A          |        |       | <- Group header
| WH-001     | TX0001 | 100   |
| WH-002     | TX0002 | 150   |
| B          |        |       | <- Group header
| WH-003     | TX0003 | 200   |
```

**Why it's wrong**:
- Group headers (A, B) inserted as separate rows
- Data rows missing factory information
- Empty cells in data columns for group headers
- Hierarchy is implicit, not explicit

**Variations**:
- Department headers with employee rows
- Category headers with product rows
- Date headers with transaction rows

**Fix approach**: 
1. Fill Down (to get group into every row)
2. Filter out empty rows (remove group headers)
3. Conditional Column (to separate hierarchy levels if needed)

**Critical**: Must Fill Down **before** filtering, or hierarchy data will be lost!

**Severity**: 🔴 High (hierarchy information lost if handled wrong)

---

### Category 5: Mixed Metrics (Stacked)

**What it looks like**:
```
| Product | Payment | Attribute | Value |
|---------|---------|-----------|-------|
| Phone   | Cash    | Sales     | 1000  |
| Phone   | Cash    | Units     | 50    |
| Phone   | Card    | Sales     | 1200  |
| Phone   | Card    | Units     | 60    |
```

**Why it's wrong**:
- Multiple metrics (Sales, Units) stacked in same column
- Each metric should be separate column
- Hard to calculate (Sales / Units) in Pivot
- Requires multiple steps to filter

**Fix approach**: Pivot Column (with "Don't Aggregate" option)

**Severity**: 🟡 Medium (usable but inefficient)

---

### Category 6: Date Issues

**What it looks like**:
```
| Date       | Sales |
|------------|-------|
| 01/12/2024 | 100   | <- Is this Jan 12 or Dec 1?
| 2024-12-01 | 150   | <- Different format
| 1/12/24    | 200   | <- Short year
```

**Why it's wrong**:
- Ambiguous interpretation (US: MM/DD, UK/TH: DD/MM)
- Mixed formats in same column
- Stored as text instead of date type
- Wrong date values if locale assumed incorrectly

**Variations**:
- Multiple locales in same file
- Dates as text "January 1, 2024"
- Excel serial numbers (44927)
- Different separators: / vs - vs .

**Fix approach**: 
- Always use "Using Locale" when setting date type
- If mixed locales: may need custom logic to detect and parse correctly

**Severity**: 🔴 High (completely wrong analysis if locale is wrong)

---

### Category 7: Multiple Tables in One Sheet

**What it looks like**:
```
Sales Table (rows 1-10)

Commission Table (rows 15-20)

Totals (row 25)
```

**Why it's wrong**:
- Cannot connect to single source
- Mixing different granularities
- Unclear boundaries between tables

**Fix approach**: 
- If truly separate topics: Separate into different sheets/files
- If related: Combine with proper relationship
- Often indicates source file should be redesigned

**Severity**: 🔴 High (requires source restructuring)

---

## Diagnostic Process

### Step 1: Quick Visual Scan (30 seconds)

Look for obvious red flags:
1. Headers span multiple rows? → Multi-row headers
2. Months/products as columns? → Wide format
3. Empty rows between groups? → Grouped data
4. Merged cells? → Merged cells
5. Multiple distinct tables? → Multiple tables

### Step 2: Check Fundamentals (1 min)

Ask these questions:
1. **One header row?** If no → Multi-row headers or merged cells
2. **One column = one concept?** If no → Wide format or mixed metrics
3. **Consistent rows?** If no → Subtotals, grouped data, or multiple tables
4. **Correct data types?** If no → Date issues, text numbers

### Step 3: Identify Primary Problem (30 seconds)

What's the BIGGEST issue blocking analysis?
- Usually: Wide format OR Multi-row headers OR Grouped data
- These often need fixing before anything else

### Step 4: List Secondary Issues (30 seconds)

What else needs attention after structure is fixed?
- Data type corrections
- Date locale settings
- Removing auto-generated steps
- Case consistency

### Step 5: Prioritize Fixes (30 seconds)

**Order matters!**
1. **Structure first**: Wide format, multi-row headers, merged cells
2. **Content next**: Data types, dates, cleaning
3. **Optimization last**: Removing redundant steps, performance tuning

## Example Diagnostic Conversations

### Example 1: Wide Format + Blank Cells

**User shows**:
```
| Product  | Jan | Feb | Mar |
|----------|-----|-----|-----|
| Phone    | 100 | 150 | 200 |
|          | 80  | 90  | 110 |
| Laptop   | 200 | 220 | 250 |
```

**Diagnosis**:
> "เห็นปัญหา 2 อย่างค่ะ:
> 1. **Wide format** - เดือน (Jan, Feb, Mar) แยกเป็นคอลัมน์ ทำให้ Pivot Table เห็น 3 fields แยกกัน
> 2. **Blank cells** - คอลัมน์ Product มีการเว้นว่างเอาไว้ (คล้ายๆ กับการ merged cell) ทำให้แถวที่ 2 ไม่มีข้อมูล Product
> 
> เราจะแก้แบบนี้นะคะ:
> 1. Fill Down ก่อน (เติม Product)
> 2. Unpivot Other Columns (แปลง wide → long)"

---

### Example 2: Grouped Data

**User shows**:
```
| Factory/WH | Sales |
|------------|-------|
| A          |       |
| WH-001     | 100   |
| WH-002     | 150   |
| B          |       |
| WH-003     | 200   |
```

**Diagnosis**:
> "นี่คือ **grouped data** ค่ะ - Factory (A, B) ถูกแทรกเป็น group headers แยก
> 
> ปัญหาคือ:
> - แถว WH-001, WH-002 ไม่รู้ว่าอยู่ Factory ไหน
> - แถว A, B ไม่มีข้อมูล Sales
> 
> ⚠️ **สำคัญมาก**: ต้อง Fill Down **ก่อน** Filter เสมอ!
> ถ้า filter ก่อน → ข้อมูล Factory จะหายไปเลย"

---

### Example 3: Multi-Row Headers

**User shows**:
```
| Q1    | Q1    | Q2    | Q2    |
| Sales | Units | Sales | Units |
|-------|-------|-------|-------|
| 1000  | 50    | 1200  | 60    |
```

**Diagnosis**:
> "นี่คือ **multi-row headers** ค่ะ - หัวตาราง 2 ชั้น (Quarter + Metric)
> 
> ปัญหาคือ:
> - ไม่สามารถ filter ตาม Quarter ได้
> - Sales กับ Units ไม่ได้แยกชัดเจน
> - Structure แบบ wide
> 
> เป้าหมายคือได้ 3 คอลัมน์: Quarter, Sales, Units
> เราจะแก้หัวก่อน แล้วค่อย unpivot + pivot"

---

## Common Diagnostic Mistakes

### ❌ Mistake: Diagnosing too vaguely

**Bad**: "ข้อมูลนี้มีปัญหา structure ค่ะ ต้องแก้"

**Good**: "เห็นปัญหา 2 อย่าง: 1) Wide format - เดือนแยกเป็นคอลัมน์ 2) Merged cells ในคอลัมน์ Product"

### ❌ Mistake: Missing impact explanation

**Bad**: "ข้อมูลนี้เป็น wide format"

**Good**: "ข้อมูลนี้เป็น wide format ทำให้ Pivot Table จะเห็น 4 fields แยกกัน (Jan, Feb, Mar, Apr) แทนที่จะเป็น 1 field Month ที่ filter ได้"

### ❌ Mistake: Not prioritizing

**Bad**: [Lists 5 problems without saying which to fix first]

**Good**: "เราจะแก้หัวตารางก่อนนะคะ (ปัญหาที่ 1 และ 2) แล้วค่อยมาแก้ data type ทีหลัง (ปัญหาที่ 3)"

### ❌ Mistake: Assuming fix without asking

**Bad**: [Immediately starts explaining unpivot steps]

**Good**: "พี่ต้องการให้ข้อมูลมีหน้าตายังไงคะ? ใช้กับ Pivot Table หรือ Power BI?"

## Tips for Effective Diagnosis

1. **Always ask about source first** - "Source จริงๆ คืออะไรคะ?" This changes the approach completely
2. **Show concrete impact** - Don't just say "it's wrong", show what will happen
3. **Use visuals** - Draw before/after tables to make it clear
4. **Prioritize ruthlessly** - Fix structure before content
5. **Check for hidden issues** - Case sensitivity, locale, hardcoded steps come later but matter

## Remember

Good diagnosis:
- **Specific** (2-3 concrete problems, not "messy data")
- **Prioritized** (what to fix first)
- **Impact-focused** (explain what will happen)
- **Solution-oriented** (hint at fix approach)

The goal: User understands their problems and feels confident about the path forward! 🎯

```

### references/transformation-patterns.md

```markdown
# Transformation Patterns

## Table of Contents

1. [Overview](#overview)
2. [Pattern 1: Wide Format → Long Format (Unpivot)](#pattern-1-wide-format--long-format-unpivot)
3. [Pattern 2: Multi-Row Headers → Single Row Headers](#pattern-2-multi-row-headers--single-row-headers)
4. [Pattern 3: Blank Cells → Filled Data](#pattern-3-blank-cells--filled-data)
5. [Pattern 4: Group Headers → Individual Rows](#pattern-4-group-headers--individual-rows)
6. [Pattern 5: Pivot Column](#pattern-5-pivot-column)
7. [Pattern 6: Date Locale Fixes](#pattern-6-date-locale-fixes)
8. [Pattern 7: Remove Auto-Generated Steps](#pattern-7-remove-auto-generated-steps)
9. [Pattern 8: Combine Files](#pattern-8-combine-files-multiple-files-with-same-structure)
10. [UI Quick Reference](#ui-quick-reference)

---

## Overview

This guide provides step-by-step UI instructions for transforming common data problems in Power Query. Each pattern includes:
- When to use it
- Detailed UI steps
- Why each step matters (concept + action)
- Common variations
- Warnings and gotchas

## Pattern 1: Wide Format → Long Format (Unpivot)

### When to Use

Data has metrics spread across columns (months, products, regions) and you need them in rows instead.

**Example**: Jan, Feb, Mar as columns → Month as a single column with values in another column

### Decision: Which Unpivot Method?

Power Query has 3 unpivot options:

**1. Unpivot Columns** ❌ **NEVER USE THIS**
- Records formula strangely
- Causes issues later
- Not recommended by experts

**2. Unpivot Other Columns** ✅ **USE WHEN**: You know which columns should stay fixed
- Select columns that won't change (Product, Category, ID)
- These stay as-is, everything else gets unpivoted
- Future-proof if new columns added (they'll auto-unpivot)
- **Best for**: Known ID/dimension columns that stay stable

**3. Unpivot Only Selected Columns** ✅ **USE WHEN**: You know which columns to unpivot
- Select columns that should be unpivoted (Jan, Feb, Mar)
- ID columns stay as-is
- Future-proof if those specific columns won't change
- **Best for**: Known measure columns that are stable

**Best Practice**: Use "Unpivot Other Columns" OR "Unpivot Only Selected Columns" - choose based on what you expect to change in future data.

### Step-by-Step: Unpivot Other Columns

**Example data**:
```
| Product | Region | Jan | Feb | Mar |
|---------|--------|-----|-----|-----|
| Phone   | North  | 100 | 150 | 200 |
```

**Steps**:

1. **Select ID columns** (columns that should NOT be unpivoted)
   - Click on "Product" column header
   - Hold Ctrl and click "Region" column header
   - These will stay as separate columns

2. **Access unpivot**
   - Click **Transform** tab in ribbon
   - In "Any Column" group, click **Unpivot Columns** dropdown arrow

3. **Choose "Unpivot Other Columns"**
   - Click **Unpivot Other Columns**
   - Jan, Feb, Mar will be transformed into two columns:
     - "Attribute" (contains: Jan, Feb, Mar)
     - "Value" (contains: 100, 150, 200)

4. **Rename columns**
   - Double-click "Attribute" header → rename to "Month"
   - Double-click "Value" header → rename to "Sales"

5. **Verify result**
   ```
   | Product | Region | Month | Sales |
   |---------|--------|-------|-------|
   | Phone   | North  | Jan   | 100   |
   | Phone   | North  | Feb   | 150   |
   | Phone   | North  | Mar   | 200   |
   ```

**Why this works**:
- **Concept**: Unpivot converts columns to rows, creating long format
- **Future-proof**: If April is added, it'll automatically unpivot
- **Analysis-ready**: Can now filter by Month, calculate trends, use in Pivot Table easily

**Common variation**: If months might get added (Apr, May) → "Unpivot Other Columns" handles it automatically

---

## Pattern 2: Multi-Row Headers → Single Row Headers

### When to Use

Headers span multiple rows (e.g., Category + Subcategory, Quarter + Metric).

**Example structures**:
```
| Customer    |       | Seller    |       |       |
| ID   | Name | ID    | Name | Age  |
```

```
| Q1         | Q1    | Q2         | Q2    |
| Sales | Units | Sales | Units |
```

### Decision Framework

**First, identify your data structure:**

1. **Transaction data?** (Each row = 1 transaction, already long format)
   - **Goal**: Fix column names only, NO unpivot needed
   - **Use**: Method 1 (Separate Header + Append)

2. **Wide format data?** (Metrics spread across columns)
   - **Goal**: Fix headers AND unpivot to long format
   - **Use**: Method 1 (reliable) or Method 2 (auto but complex)

3. **Mixed hierarchy?** (Some columns 1 level, some 2 levels)
   - **Use**: Method 1 (Separate Header + Append)

### Two Methods Available

**Method 1: Separate Header + Append** ✅ **Recommended**
- Manually create correct 1-row headers, then append with body
- **Best for**: Mixed hierarchy, transaction data, reliability
- **Pros**: Simple, always works, easy to troubleshoot
- **Cons**: Must maintain header definition

**Method 2: Transpose Method** 🟡 **Advanced**
- Use Transpose + Fill Down + Transpose back to auto-generate headers
- **Best for**: Uniform hierarchy, automated solution
- **Pros**: Automated, headers update from source
- **Cons**: Complex, many steps, doesn't work with mixed hierarchy

### Complete Guide

**📖 For detailed step-by-step instructions, decision trees, and examples:**

**→ See `references/multi-row-headers.md`**

That file contains:
- Complete step-by-step for both methods
- When to use which method
- Quick decision tree
- Method comparison table
- Troubleshooting tips
- Examples for transaction vs wide format data

---

## Pattern 3: Blank Cells → Filled Data

### When to Use

Cells are intentionally left blank for visual grouping (similar to merged cells functionality), and only first row of each group has data.

**Example**:
```
| Product | Sales |
|---------|-------|
| Phone   | 100   |
|         | 150   | <- Product is blank (left empty for visual grouping)
|         | 120   |
| Laptop  | 200   |
```

**Steps**:

1. **Select the column with blank cells**
   - Click the column header (e.g., "Product")

2. **Fill Down**
   - Right-click the column → **Fill** → **Down**
   - All blank cells get filled with value from above
   - Result:
     ```
     | Product | Sales |
     |---------|-------|
     | Phone   | 100   |
     | Phone   | 150   |
     | Phone   | 120   |
     | Laptop  | 200   |
     ```

**Why this works**:
- **Concept**: Fill Down copies value from above into empty cells
- **Critical**: Do this BEFORE any filtering or removing rows
- **Use case**: Blank cells are common in Excel reports where categories are shown once for visual grouping

**What to tell user**:
"คอลัมน์นี้มีการเว้นว่างเอาไว้ให้ข้อมูลดูง่าย (คล้ายๆ กับการ merged cell) ทำให้แถวที่ 2, 3 ไม่มีข้อมูล เราจะใช้ Fill Down เพื่อเติมข้อมูลให้ทุกแถวนะคะ"

**Warning**: If you filter or remove rows before Fill Down, you lose the category information forever!

---

## Pattern 4: Grouped Data → Explicit Hierarchy

### When to Use

Group headers are inserted as separate rows between data (Factory A, Factory B, etc.).

**Example**:
```
| Factory/WH | TXID   | Sales |
|------------|--------|-------|
| A          |        |       | <- Group header
| WH-001     | TX0001 | 100   |
| WH-002     | TX0002 | 150   |
| B          |        |       | <- Group header
| WH-003     | TX0003 | 200   |
```

**Critical Rule**: **Fill Down → THEN → Filter** (never the other way!)

**Steps**:

1. **Fill Down the hierarchy column**
   - Select "Factory/WH" column
   - Right-click → **Fill** → **Down**
   - Now every row has Factory code:
     ```
     | Factory/WH | TXID   | Sales |
     |------------|--------|-------|
     | A          |        |       |
     | A          | TX0001 | 100   | <- Got "A" filled
     | A          | TX0002 | 150   |
     | B          |        |       |
     | B          | TX0003 | 200   | <- Got "B" filled
     ```

2. **Separate hierarchy levels** (if needed)
   - If Factory and Warehouse are in same column, use data-driven logic:
   
   **💡 Best Method: Data-Driven Logic**
   
   **Observe what distinguishes header rows from data rows**:
   - Factory header rows (A, B, C) → **No TXID** (null/empty)
   - Warehouse data rows (WH-001, WH-002) → **Have TXID** (TX0001, TX0002)
   
   **This is the key insight!** Use data characteristics, not naming patterns.
   
   - Click **Add Column** tab → **Conditional Column**
   - Name: "Factory"
   - If `[TXID] = null` then `[Factory/WH]` else `null`
   - This extracts Factory row values when there's no TXID
   - Fill Down the new "Factory" column
   - The original column becomes "Warehouse"
   
   **Why this is best**:
   - ✅ Works with ANY factory name (A, AA, North, Site-01, anything!)
   - ✅ Data-driven, not pattern-based
   - ✅ Future-proof for new factories
   - ✅ Clear logic: "No TXID = factory header"
   
   **Less flexible alternatives**:
   - ❌ `Text.Length([Factory/WH]) = 1` - Only works for single-character names
   - ❌ `= "A" or "B" or "C"` - Hardcoded, must update for new values

3. **⚠️ Filter out empty rows (DO NOT SKIP THIS STEP!)**
   - Click "TXID" column
   - Click filter dropdown (funnel icon)
   - Click **Remove Empty**
   - This removes group header rows (A, B, C lines)
   - **Critical**: Without this step, you have duplicate/redundant rows!

4. **Clean up**
   - Remove or rename columns as needed
   - Set correct data types (Decimal Number for amounts!)

**Result**:
```
| Factory | Warehouse | TXID   | Sales |
|---------|-----------|--------|-------|
| A       | WH-001    | TX0001 | 100   |
| A       | WH-002    | TX0002 | 150   |
| B       | WH-003    | TX0003 | 200   |
```

**Why this works**:
- **Fill Down**: Makes implicit hierarchy (visual grouping) explicit (data in every row)
- **Data-driven logic**: Checks actual data characteristics (has TXID?) not patterns (text length)
- **Filter after Fill Down**: Removes now-useless group header rows
- **Flexible**: Works with any naming convention

**⚠️ Critical Warnings**: 
1. If you filter BEFORE Fill Down, the Factory information is lost forever! Always Fill Down first.
2. If you skip the Filter step, you'll have duplicate rows (the A, B, C header rows remain)

---

## Pattern 5: Stacked Metrics → Separate Columns (Pivot)

### When to Use

Multiple metrics (Sales, Units, etc.) are stacked in rows instead of being in separate columns.

**Example**:
```
| Product | Payment | Attribute | Value |
|---------|---------|-----------|-------|
| Phone   | Cash    | Sales     | 1000  |
| Phone   | Cash    | Units     | 50    |
| Phone   | Card    | Sales     | 1200  |
| Phone   | Card    | Units     | 60    |
```

**Goal**: Separate Sales and Units into their own columns

**Steps**:

1. **Select the Attribute column**
   - Click "Attribute" column header
   - This contains the metric names (Sales, Units)

2. **Pivot Column**
   - Click **Transform** tab → **Pivot Column**
   - Dialog appears

3. **Configure pivot**
   - **Values Column**: Select "Value" (contains the numbers)
   - **Advanced Options**: Click to expand
   - **Aggregate Value Function**: Select **Don't Aggregate**
   - Click OK

4. **Result**:
   ```
   | Product | Payment | Sales | Units |
   |---------|---------|-------|-------|
   | Phone   | Cash    | 1000  | 50    |
   | Phone   | Card    | 1200  | 60    |
   ```

**Why this works**:
- **Concept**: Pivot converts unique values in one column into separate columns
- **Don't Aggregate**: Critical! We want values as-is, not summed/averaged
- **Use case**: When metrics are stacked vertically but should be horizontal

**Common variation**: After unpivoting wide data, you might need to split and pivot again to get correct structure

---

## Pattern 6: Date Locale Fixes

### When to Use

Dates are ambiguous (01/12 could be Jan 12 or Dec 1) or stored as text.

**Example**:
```
| Date       | Sales |
|------------|-------|
| 01/12/2024 | 100   | <- Jan 12 (US) or Dec 1 (UK/TH)?
```

**Steps**:

1. **Select the Date column**
   - Click "Date" column header

2. **Change Type with Locale**
   - Click **Transform** tab
   - Click data type dropdown → **Date**
   - **DO NOT** just click "Date" directly!

3. **Specify Using Locale**
   - After selecting Date type, dialog appears
   - OR: Click **Transform** → **Data Type** → **Using Locale**
   - Choose correct locale:
     - "English (United States)" for MM/DD/YYYY
     - "English (United Kingdom)" for DD/MM/YYYY  
     - "Thai" for DD/MM/YYYY (Thailand standard)

4. **Verify**
   - Check a few dates to ensure correct interpretation
   - 01/12/2024 should become expected date

**Why this works**:
- **Concept**: Locale tells Power Query how to interpret ambiguous dates
- **Critical**: Without locale, Power Query guesses (often wrong!)
- **Use case**: Any date data, especially from international sources

**⚠️ Warning**: If you have MIXED locales in same column (some US, some UK), you'll need custom M code to detect and parse correctly - this is advanced!

**Best practice**: Always use "Using Locale" when setting date types, even if it seems clear.

---

## Pattern 7: Remove Auto-Generated Steps

### When to Use

Power Query auto-generates "Changed Type" steps that hardcode column names, which break when source changes.

**Problem**:
- You rename/add columns
- Old "Changed Type" step still references old column names
- Query breaks on refresh

**Steps**:

1. **Find the step**
   - Look at **Applied Steps** panel (right side)
   - Find "Changed Type" steps (usually multiple)
   - Look at formula bar to see if it has hardcoded column names

2. **Delete problematic steps**
   - Right-click the "Changed Type" step
   - Click **Delete**
   - If dialog appears about dependencies, review carefully

3. **Re-apply types correctly**
   - Select columns that need type changes
   - Set correct data type
   - For dates: Use "Using Locale"
   - This creates new "Changed Type" step with current structure

**Why this works**:
- **Problem**: Auto-steps lock in column names at that point in time
- **Solution**: Remove old locks, create new ones after transformations
- **Best practice**: Always check Applied Steps after major transformations

**Gotcha**: Sometimes you need these steps! Don't delete if they're still correct.

---

## Pattern 8: Combine Files (Multiple Files with Same Structure)

### When to Use

Multiple files (Excel, CSV) in a folder need to be combined into one table.

### Method A: From Folder (Local)

**Steps**:

1. **Get Data from Folder**
   - **Home** tab → **Get Data** → **From Folder**
   - Browse to folder containing files
   - Click OK

2. **Combine Files**
   - In preview, click **Combine** button (bottom right)
   - Or click **Transform Data** then **Combine Files** in toolbar
   - Power Query creates:
     - "Transform Sample File" query (template for transformations)
     - Main query that applies template to all files

3. **Transform the sample**
   - Click "Transform Sample File" query
   - Make any needed transformations (unpivot, remove rows, etc.)
   - These will apply to ALL files automatically

4. **Go back to main query**
   - Click main combined query
   - All files now processed with your transformations

**Why this works**:
- **Transform Sample File**: Template that's applied to every file
- **Automatic**: New files added to folder get auto-included on refresh
- **Consistent**: Ensures same transformations on all files

---

### Method B: From SharePoint Folder (Online)

**Steps**:

1. **Get Data from SharePoint Folder**
   - **Home** tab → **Get Data** → **From SharePoint Folder**
   - Enter SharePoint site URL
   - Click OK

2. **CRITICAL: Change M Code**
   - Click **Advanced Editor**
   - Find: `SharePoint.Files`
   - Change to: `SharePoint.Contents`
   - Click Done
   - *This is required for SharePoint folders to work properly*
   - Reference: https://www.thepexcel.com/power-query-get-data-online-sources/

3. **Filter to your files**
   - Filter "Folder Path" to target folder
   - Filter "Name" or "Extension" if needed

4. **Combine Files**
   - Same as Method A from here
   - Click Combine, transform sample, etc.

**Why the code change**:
- `SharePoint.Files` has limitations with folders
- `SharePoint.Contents` works more reliably
- This is a known issue/workaround

---

### Method C: Custom Function (Multiple Sheets in One File)

**When to use**: Need to combine Sheet1, Sheet2, Sheet3 from a single Excel file

**Note**: No built-in UI for this - requires creating custom function

**High-level approach**:
1. Create function that processes one sheet
2. Get list of sheet names
3. Apply function to each sheet
4. Combine results

**Details**: This requires M code. If user needs this, suggest they search for "Power Query custom function multiple sheets" or provide basic template.

---

## UI Quick Reference

### Common Locations:

**Transform Tab**:
- Unpivot Columns (and dropdown for Other/Only Selected)
- Transpose
- Fill (Right-click column → Fill → Down/Up)
- Split Column
- Extract (First/Last Characters, Text Before/After)
- Data Type (with "Using Locale" option)

**Add Column Tab**:
- Conditional Column
- Custom Column
- Duplicate Column

**Home Tab**:
- Remove Rows / Keep Rows
- Use First Row as Headers
- Combine Files

**Transform > Any Column Group**:
- Pivot Column
- Unpivot Columns (dropdown)

**Right-Click Column**:
- Fill → Down/Up
- Remove
- Duplicate Column
- Replace Values
- Split Column
- Change Type (with Using Locale)

---

## Tips for Effective Transformations

1. **Always check Applied Steps** - Understand what each step does
2. **Remove auto "Changed Type" after major changes** - Prevents breaking on refresh
3. **Fill Down before filtering** - Or you lose hierarchy information
4. **Use "Using Locale" for all dates** - Prevents misinterpretation
5. **Choose right unpivot method** - Think about future data changes
6. **Don't Aggregate when pivoting** - Usually want values as-is
7. **Test with new data** - Add a new month/product and refresh to verify future-proofing

## Remember

Good transformations are:
- **Future-proof** (handle new columns/rows automatically)
- **Clear** (each step has obvious purpose)
- **Robust** (don't hardcode values that might change)
- **Efficient** (minimum steps to achieve goal)

The goal: Transform data reliably so users can refresh anytime! 🔄✨

```

### references/best-practices.md

```markdown
# Best Practices for Power Query

## Table of Contents

1. [Overview](#overview)
2. [Core Principles](#core-principles)
   - [1. Reproducibility: Eliminate Manual Steps](#1-reproducibility-eliminate-manual-steps)
   - [2. Portability: Query Workbook Separation](#2-portability-query-workbook-separation)
   - [3. Find the True Source](#3-find-the-true-source)
   - [4. Future-Proofing: Design for Change](#4-future-proofing-design-for-change)
   - [5. Headers Before Everything](#5-headers-before-everything)
   - [6. Document and Organize](#6-document-and-organize)
   - [7. Keep It Simple](#7-keep-it-simple)
3. [Specific Practices](#specific-practices)
   - [Working with Dates](#working-with-dates)
   - [Working with Numbers](#working-with-numbers)
   - [Working with Text](#working-with-text)
   - [Combining Data](#combining-data)
   - [Performance Optimization](#performance-optimization)
4. [Workflow Best Practices](#workflow-best-practices)
   - [Starting a New Project](#starting-a-new-project)
   - [Maintaining Existing Queries](#maintaining-existing-queries)
5. [Common Patterns](#common-patterns)
   - [Pattern: Monthly Report Refresh](#pattern-monthly-report-refresh)
   - [Pattern: Combining Multiple Sources](#pattern-combining-multiple-sources)
   - [Pattern: Lookup/Reference Tables](#pattern-lookupreference-tables)
6. [Checklist for "Good" Queries](#checklist-for-good-queries)
7. [Anti-Patterns to Avoid](#anti-patterns-to-avoid)
8. [Learning Path](#learning-path)
9. [Remember](#remember)

---

## Overview

This guide covers principles and practices that lead to robust, maintainable, and future-proof Power Query solutions. Use this for general guidance and to help users build better habits.

## Core Principles

### 1. Reproducibility: Eliminate Manual Steps

**The Golden Rule**: If you can't press "Refresh" and get updated results automatically, it's not truly automated.

**What reproducibility means**:
- No manual copying/pasting between files
- No manual formatting in Excel before loading
- No "open file → delete rows → save → load" workflows
- Everything automated in Power Query

**Why it matters**:
```
❌ Bad workflow:
1. Export from system → CSV
2. Open in Excel
3. Manually delete header rows
4. Fix column names
5. Save as .xlsx
6. Load into Power Query

✅ Good workflow:
1. Load CSV directly into Power Query
2. Remove top rows in query
3. Promote/rename headers in query
4. Set types in query
5. Refresh anytime
```

**How to achieve it**:
- **Find the true source**: CSV, database, API, not manually edited files
- **Move all prep into query**: Every manual step should become a Power Query step
- **Use From Folder**: For multiple files, don't manually combine
- **Parameter**: Use parameters for changing paths/dates instead of editing query

**Questions to ask**:
- "Where does this data really come from?"
- "What manual steps do you do before loading?"
- "If new data arrives tomorrow, can you just refresh?"

**Benefits**:
- Save time (no repetitive manual work)
- Reduce errors (automation is consistent)
- Scale easily (handle 10 or 1000 files the same way)
- Documentation (query shows exactly what's done)

---

### 2. Portability: Query Workbook Separation

**The Principle**: Create Power Query transformations in a separate workbook from your data source.

**Why separate**:
```
❌ Bad: Query and data in same file
data_march.xlsx:
  - Sheet1: Raw data
  - Queries: Transformations
  
→ Hard to copy queries to data_april.xlsx
→ Hard to move to Power BI
→ Have to redo transformations each time

✅ Good: Query workbook separate
transform_pipeline.xlsx:
  - Only queries, no data
  - Points to external source
  
→ Copy queries to any new file
→ Move to Power BI easily
→ One set of transformations, many uses
```

**How to set up**:
1. Create new blank workbook: "Data_Transformations.xlsx"
2. Get Data from external source (CSV, SharePoint, folder)
3. Build all transformations
4. Save query workbook
5. Copy queries to target workbook or Power BI as needed

**Benefits**:
- **Reusability**: Same queries for different periods/regions
- **Flexibility**: Easy to move between Excel and Power BI
- **Maintenance**: Update queries in one place
- **Sharing**: Share query file, not data

**Real-world example**:
```
Monthly Sales Report:
- data_pipeline.xlsx (queries only)
  - Connects to SharePoint/sales-data-202410
  - Transformations: unpivot, clean, aggregate
  
- October_Report.xlsx
  - Loads from data_pipeline queries
  - Creates Pivot Tables
  
- November_Report.xlsx
  - Same queries, different month
  - Just change source path parameter
```

---

### 3. Find the True Source

**The Question**: "Where does this data REALLY come from?"

**Common wrong answers**:
- "I got this Excel file from my colleague"
- "It's a report that gets sent to me"
- "I download and clean it manually"

**Right answers dig deeper**:
- "It's exported from SAP as CSV"
- "It's in our SharePoint folder"
- "It comes from Salesforce API"
- "Database pulls it nightly"

**Why this matters**:
```
Scenario: User shows Excel file with neat data
Wrong: Load from Excel → Can't refresh
Right: Find it's from SharePoint → Load from SharePoint → Can refresh

Scenario: User shows manually combined sheets
Wrong: Use their combined file → Manual work on new data
Right: Get original sheets → Combine Files → Automatic on new data
```

**Questions to ask**:
1. "Where did this file come from originally?"
2. "How do you get updated data?"
3. "Does someone send this, or is it in a shared location?"
4. "Is this manually edited, or straight from the source?"

**Red flags (not true source)**:
- "I delete these rows first"
- "I fix the headers in Excel"
- "I copy from multiple sheets"
- "Someone emails it to me"

**Goal**: Connect Power Query to the actual source system (database, API, shared folder) not to manually prepared files.

**Benefits**:
- Refresh works automatically
- No manual steps to forget
- Always current data
- Audit trail clear

---

### 4. Future-Proofing: Design for Change

**The Mindset**: Data changes. New columns appear. New categories get added. Query should handle it.

**How to future-proof**:

**Use "Unpivot Other Columns"**:
```
❌ Hardcoded: Unpivot Jan, Feb, Mar
   → Apr added next month → Breaks

✅ Future-proof: Unpivot Other Columns (keep Product, Region)
   → Apr automatically unpivoted
```

**Use Dynamic Filters**:
```
❌ Hardcoded: Keep rows where Product = "A" or "B" or "C"
   → Product D added → Excluded

✅ Dynamic: Remove Empty, or keep where Sales > 0
   → Product D automatically included
```

**Avoid Hardcoded Column Names**:
```
❌ Changed Type with specific columns
   → Column renamed → Breaks

✅ Delete old Changed Type after transformation
   → Re-apply with new structure
```

**Use Parameters**:
```
❌ Hardcoded path: C:\Reports\2024\October\data.csv
   → November comes → Have to edit query

✅ Parameter: FilePath = "C:\Reports\" & Year & "\" & Month & "\data.csv"
   → Change parameter, not query
```

**Think about**:
- "What if a new column is added?"
- "What if values change?"
- "What if structure is slightly different?"
- "Will this work next month/year?"

**Test future-proofing**:
1. Add a test column to source
2. Refresh query
3. Does it handle it correctly?

---

### 5. Headers Before Everything

**The Rule**: Fix header structure BEFORE worrying about data quality.

**Why**:
```
Wrong order:
1. Fix data types → Applied to wrong columns
2. Clean data → Cleaning wrong structure  
3. Fix headers → Everything breaks
4. Have to redo steps 1-2

Right order:
1. Fix headers → Get structure right
2. Fix data types → Apply to correct columns
3. Clean data → Clean correct data
```

**Common scenarios**:

**Wide format + multi-row headers**:
- Fix BOTH at same time
- Get to single-row, long format
- Then worry about types and cleaning

**Merged cells**:
- Fill Down immediately
- Before any filtering or calculations

**Grouped data**:
- Fill Down hierarchy
- Remove group headers
- Then work with clean structure

**Why this matters**:
- Column names change during structure fixes
- Old "Changed Type" steps break
- Transformations apply to wrong columns

**Remember**: Structure first, content second

---

### 6. Document and Organize

**Name things clearly**:
```
❌ Bad query names:
- Query1
- Table_1
- Step 5

✅ Good query names:
- Sales_Raw
- Sales_Cleaned
- Sales_Aggregated
- Product_Lookup
```

**Rename steps**:
```
❌ Bad step names:
- Changed Type
- Changed Type1
- Changed Type2

✅ Good step names:
- Set_Initial_Types
- Remove_Empty_Rows
- Unpivot_Months
- Merge_with_Product_Details
```

**Add comments**:
- Right-click step → Properties → Add description
- Especially for complex M code
- Explain WHY, not just WHAT

**Group related queries**:
- Use folders/groups in Query Editor
- "Source Queries", "Transformation", "Output"

**Benefits**:
- Future you understands what past you did
- Others can understand your work
- Easier to debug and maintain

---

### 7. Keep It Simple

**The Principle**: Simplest solution that works is usually best.

**Examples**:

**Over-complicated**:
```
❌ 15 steps with complex M code to combine sheets
✅ Combine Files from folder → 3 clicks
```

**Over-engineered**:
```
❌ Custom function with error handling for every scenario
✅ Simple unpivot + filter → Handles 90% of cases
```

**Too clever**:
```
❌ Nested Table.AddColumn with LAMBDA and List.Transform
✅ Add Conditional Column in UI → Same result, readable
```

**When to use M code**:
- UI can't do it
- Complex conditional logic needed
- Performance optimization required
- Truly dynamic behavior needed

**When to stick with UI**:
- UI can do it easily
- Steps are clear
- Future maintainer will understand

**Benefits**:
- Easier to maintain
- Easier to debug
- Easier for others to understand
- Less likely to break

**Rule of thumb**: If you're writing M code, ask "Can I do this in UI instead?" If yes, probably should.

---

## Specific Practices

### Working with Dates

**Always use "Using Locale"**:
- Transform → Data Type → Date → **Using Locale**
- Choose correct locale (US, UK, Thai, etc.)
- Never just click "Date" without specifying

**Handle mixed formats**:
- If possible, standardize at source
- If not, may need M code to detect and parse

**Store as actual dates**:
- Not text like "Jan 2024"
- Not Excel serial numbers (44927)
- Actual Date type for filtering and sorting

---

### Working with Numbers

**Always prefer Decimal Number over Whole Number**:
```
✅ Use Decimal Number by default:
- Prices, amounts, monetary values
- Rates, percentages, ratios
- Weights, measurements
- Any value that might have decimals in future

✅ Use Whole Number only when certain:
- Count of discrete items (truly no decimals possible)
- But even "quantity" might need decimals (0.5 kg sold)
```

**Why Decimal is safer**:
```
Scenario: Price column
Current data: 100, 200, 300 (all whole numbers)
Set as: Whole Number

Next month: 150.50 arrives
Result: ❌ Error or truncation → Data loss!

Better: Set as Decimal Number from start
→ Works with 100, 200, 300 now
→ Works with 150.50 later
→ Future-proof! ✅
```

**Principle**: 
> **"When in doubt, use Decimal Number"**
> Decimal handles whole numbers perfectly (100 = 100.0)
> But Whole Number cannot handle decimals

**Set correct types**:
- Whole Number vs Decimal Number
- Currency if appropriate

**Be aware of Banker's Rounding**:
- Default Power Query behavior
- Round .5 to even number
- Use Number.RoundAwayFromZero if need normal rounding

**Handle negative numbers**:
- Check if stored as text with parentheses: (100)
- May need Replace or custom parsing

---

### Working with Text

**Mind case sensitivity**:
- "Product" ≠ "product"
- Use Transform → Format → UPPERCASE or lowercase if needed

**Trim whitespace**:
- Transform → Format → Trim
- Removes leading/trailing spaces
- Prevents "Product " ≠ "Product" issues

**Handle special characters**:
- Be careful with quotes, commas in CSV
- Use proper delimiters

---

### Combining Data

**Append when** (vertical combination):
- Same structure, more records
- Sales from multiple months/regions
- Requirement: Column names must match

**Merge when** (horizontal combination):
- Related data, need to enrich
- Customers + Orders, Products + Categories
- Requirement: Common key column

**Combine Files when**:
- Multiple files, same structure
- Use Transform Sample File
- All transformations in sample, not in main query

---

### Performance Optimization

**Query Folding**:
- Keep source-side operations at top (filter, select columns)
- These push to database/source (fast)
- Local operations last (slower)

**Minimize steps**:
- Remove unnecessary steps
- Combine multiple operations if possible
- Delete redundant transformations

**Disable loading** when appropriate:
- Intermediate queries don't need to load to Excel
- Right-click query → Uncheck "Enable load"
- Reduces memory usage

**Buffer() carefully**:
- Only use when needed (multiple references to expensive operation)
- Overuse can slow things down

---

## Workflow Best Practices

### Starting a New Project

**1. Understand the need**:
- What's the final output?
- Who will use it?
- How often will it refresh?

**2. Find true source**:
- Where does data really come from?
- Can we connect directly?

**3. Create query workbook**:
- Separate from data
- Build transformations

**4. Develop iteratively**:
- Get raw data
- Fix structure
- Fix content
- Create outputs
- Test refresh

**5. Document**:
- Name queries clearly
- Rename steps
- Add comments

---

### Maintaining Existing Queries

**When data changes**:
- Don't panic!
- Check Applied Steps one by one
- Find where it breaks
- Understand why
- Fix or adapt

**When source changes**:
- Update connection
- Verify column names still match
- Test thoroughly

**When requirements change**:
- Add new transformations after existing ones
- Don't delete old steps unless sure
- Test with existing outputs first

**Regular review**:
- Periodically check queries still make sense
- Remove obsolete steps
- Optimize if slow
- Update documentation

---

## Common Patterns

### Pattern: Monthly Report Refresh

**Setup**:
1. Query workbook points to SharePoint folder
2. Transformations process latest files
3. Output loaded to reporting workbook
4. Pivot Tables/charts use query output

**Monthly process**:
1. New data arrives in SharePoint
2. Open report workbook
3. Data → Refresh All
4. Charts update automatically
5. Done!

---

### Pattern: Combining Multiple Sources

**Setup**:
1. Query for each source (Database, SharePoint, CSV)
2. Transform each to common structure
3. Append all sources
4. Final cleanup and loading

**Benefits**:
- Single source of truth
- All data cleaned consistently
- Easy to add new sources

---

### Pattern: Lookup/Reference Tables

**Setup**:
1. Main transaction query
2. Separate queries for lookups (products, customers, categories)
3. Merge main with lookups to enrich
4. Final output has all needed info

**Tips**:
- Set lookups to "Enable load = False"
- Use Left Outer join for main data
- Expand only needed columns from lookup

---

## Checklist for "Good" Queries

Before considering query complete, check:

**Structure**:
- [ ] Connects to true source (not manually edited file)
- [ ] Separate query workbook (if appropriate)
- [ ] Can refresh automatically

**Transformations**:
- [ ] Headers fixed first
- [ ] Types set correctly (with "Using Locale" for dates)
- [ ] No hardcoded filters
- [ ] Future-proofed (new columns/values will work)

**Quality**:
- [ ] Steps are clear and necessary
- [ ] Queries and steps named well
- [ ] Complex steps have comments
- [ ] Old/redundant steps removed

**Testing**:
- [ ] Tested with different data
- [ ] Refresh works
- [ ] Output is correct format
- [ ] Performance acceptable

**Documentation**:
- [ ] Source documented
- [ ] Purpose clear
- [ ] Maintained able by others

---

## Anti-Patterns to Avoid

**Don't**:
- ❌ Edit source data manually before loading
- ❌ Use "Unpivot Columns"
- ❌ Hardcode values that might change
- ❌ Skip "Using Locale" for dates
- ❌ Have queries and data in same file (for production)
- ❌ Use checkbox filters for dynamic data
- ❌ Make changes to main query in Combine Files
- ❌ Filter before Fill Down on grouped data

**Do**:
- ✅ Automate everything in Power Query
- ✅ Use "Unpivot Other/Selected Columns"
- ✅ Use dynamic conditions and parameters
- ✅ Always specify date locale
- ✅ Separate query workbook for portability
- ✅ Use Remove Empty or conditions
- ✅ Transform Sample File for Combine Files
- ✅ Fill Down before filtering

---

## Learning Path

### For Beginners:
1. Master UI basics (unpivot, pivot, merge, append)
2. Understand data types and dates
3. Learn to use Applied Steps
4. Practice finding true source

### For Intermediate:
1. Future-proof transformations
2. Query workbook separation
3. Combine Files effectively
4. Understand when to merge vs append

### For Advanced:
1. Learn M code basics (when UI isn't enough)
2. Custom functions
3. Query folding optimization
4. Parameters and dynamic queries

---

## Remember

**Good Power Query practice is about**:
- **Reproducibility**: Press refresh, get results
- **Portability**: Works in different contexts
- **Future-proofing**: Handles change gracefully
- **Simplicity**: Clear, maintainable solutions
- **Documentation**: Future you will thank you

**The goal**: Build queries that work reliably, refresh automatically, and are easy to understand and maintain! 🎯✨

```

### references/common-pitfalls.md

```markdown
# Common Pitfalls and Recovery Strategies

## Table of Contents

1. [Overview](#overview)
2. [Critical Gotchas (Must Know!)](#critical-gotchas-must-know)
   - [1. M Code Column References with Special Characters](#1-m-code-column-references-with-special-characters)
   - [2. Case Sensitivity](#2-case-sensitivity)
   - [3. Lazy Filter (Hardcoded Values)](#3-lazy-filter-hardcoded-values)
   - [4. Wrong Order: Filter Before Fill Down](#4-wrong-order-filter-before-fill-down)
   - [5. Banker's Rounding](#5-bankers-rounding-not-normal-rounding)
   - [6. Date Locale Misinterpretation](#6-date-locale-misinterpretation)
   - [7. Auto "Changed Type" With Hardcoded Columns](#7-auto-changed-type-with-hardcoded-columns)
   - [8. Sort Then Remove Duplicates](#8-sort-then-remove-duplicates-loses-original-sort)
   - [9. Using Whole Number When Should Use Decimal](#9-using-whole-number-when-should-use-decimal)
   - [10. Unpivot Columns (The Forbidden One)](#10-unpivot-columns-the-forbidden-one)
   - [11. Merge vs Append Confusion](#11-merge-vs-append-confusion)
   - [12. Transform Sample File Not Used Correctly](#12-transform-sample-file-not-used-correctly)
3. [Mistakes by Experience Level](#mistakes-by-experience-level)
   - [Beginner Mistakes](#beginner-mistakes)
   - [Intermediate Mistakes](#intermediate-mistakes)
   - [Advanced Mistakes](#advanced-mistakes)
4. [Recovery Strategies](#recovery-strategies)
   - [General Recovery Approach](#general-recovery-approach)
   - [Specific Recoveries](#specific-recoveries)
5. [Proactive Warning Phrases](#proactive-warning-phrases)
6. [Diagnostic Questions for Troubleshooting](#diagnostic-questions-for-troubleshooting)
7. [Prevention > Recovery](#prevention--recovery)
8. [Remember](#remember)

---

## Overview

This guide covers common mistakes users make with Power Query, why they happen, and how to recover. Use this when user encounters issues or to warn them proactively.

## Critical Gotchas (Must Know!)

### 1. M Code Column References with Special Characters

**Problem**: Column names with special characters need special syntax in M code

**Where it matters**:
- Column names with `/` : `Factory/Warehouse`
- Column names with `-` : `Product-Code`
- Column names with spaces: `My Column`
- Column names starting with numbers: `2024 Sales`
- Any special characters or reserved words

**Correct syntax**:
```
❌ Wrong: [Factory/Warehouse]
✅ Right: [#"Factory/Warehouse"]

❌ Wrong: [Product-Code]
✅ Right: [#"Product-Code"]

❌ Wrong: [My Column]
✅ Right: [#"My Column"]
```

**Rule**: 
- Simple names (letters, numbers, underscore): `[ColumnName]`
- Special characters or spaces: `[#"Column Name"]`

**Why it matters**:
- If you write M code or conditional columns with wrong syntax → Error
- Example in Factory/Warehouse case: `if [TXID] = null then [#"Factory/Warehouse"] else null`

**Prevention**:
- When referencing columns in formulas, check if name has special characters
- Use `[#"..."]` syntax for any non-simple names
- Power Query UI usually generates correct syntax, but manual formulas need attention

---

### 2. Case Sensitivity

**Problem**: Power Query is case-sensitive EVERYWHERE

**Where it matters**:
- Column names: "Sales" ≠ "sales"
- Filter values: "Product A" ≠ "product a"  
- M code: `Table.SelectRows` ≠ `table.selectrows`
- Combine files: Files with different casing create separate columns

**Example disaster**:
```
File1.xlsx has column: "Product"
File2.xlsx has column: "product"
→ Combine Files creates TWO columns: "Product" and "product"
→ Data split across both, analysis broken
```

**Prevention**:
- Standardize column names at source
- Use Transform > Format > UPPERCASE or lowercase
- Always check combined data for duplicate columns with different cases

**Recovery**:
- Rename columns to match
- Or merge columns if data already split

**Why it happens**: Power Query treats text literally, unlike Excel which is case-insensitive

---

### 2. Lazy Filter (Hardcoded Values)

**Problem**: Using UI checkbox filter hardcodes specific values into query

**Where it happens**:
- Click filter dropdown
- Check/uncheck specific items
- This creates: `Table.SelectRows(#"Previous Step", each [Column] = "Value1" or [Column] = "Value2")`

**Why it's bad**:
```
January data: Filter shows "Product A", "Product B", "Product C"
→ You uncheck "Product C"
→ Query hardcodes: Keep only "Product A" and "Product B"
→ February data: New "Product D" arrives
→ Query still only keeps A and B
→ Product D is silently excluded!
```

**Better alternatives**:

**Alternative 1: Remove Empty**
- If you want to remove blanks:
- Click filter dropdown → **Remove Empty**
- This creates dynamic filter, not hardcoded values

**Alternative 2: Keep/Remove Rows with condition**
- **Home** tab → **Remove Rows** → **Remove Blank Rows**
- Or use conditional logic: "keep rows where [Sales] > 0"

**Alternative 3: Custom condition in filter**
- Instead of checking boxes, use "Text Filters" or "Number Filters"
- "Contains", "Greater than", etc. are dynamic

**Recovery**:
- Go to Applied Steps
- Find the filter step
- Look at formula bar: if it says `= "Value1" or = "Value2"` → it's hardcoded
- Delete the step and redo with Remove Empty or condition

**Why it happens**: UI checkbox filter is convenient but creates static filter

---

### 3. Wrong Order: Filter Before Fill Down

**Problem**: Filtering before Fill Down loses hierarchy information permanently

**Scenario**:
```
| Factory | Sales |
|---------|-------|
| A       |       | <- Group header
| WH-001  | 100   |
| WH-002  | 150   |
| B       |       | <- Group header
```

**Wrong order**:
1. Filter out empty Sales → Removes A and B rows
2. Try to Fill Down → Nothing to fill! Factory info is gone

**Correct order**:
1. Fill Down Factory column → A fills down to WH-001 and WH-002
2. Filter out empty Sales → Removes group headers but data rows have Factory

**Recovery**:
- If you already filtered: **Undo is your only hope**
- Delete the filter step in Applied Steps
- Redo: Fill Down first, then filter

**Rule**: **Always Fill Down before filtering grouped data**

**Why it happens**: Users see empty rows and want to remove them immediately, not realizing they contain critical information

---

### 4. Banker's Rounding (Not Normal Rounding!)

**Problem**: Power Query uses banker's rounding, not mathematical rounding

**What it is**:
- Normal rounding: 0.5 → 1, 1.5 → 2, 2.5 → 3
- Banker's rounding: 0.5 → 0 (even), 1.5 → 2 (even), 2.5 → 2 (even)
- Rule: Round .5 to nearest EVEN number

**Example**:
```
Number.Round(0.5) = 0  (not 1!)
Number.Round(1.5) = 2  (correct)
Number.Round(2.5) = 2  (not 3!)
Number.Round(3.5) = 4  (correct)
```

**Why it matters**:
- Financial calculations may be off
- Sum of rounded values ≠ expected
- Different from Excel's ROUND()

**Solution**:
- If you need mathematical rounding: Use M code `Number.RoundAwayFromZero()`
- If banker's rounding is OK: No action needed
- Be aware: Default is banker's

**Why it happens**: Banker's rounding reduces statistical bias in large datasets

---

### 5. Date Locale Misinterpretation

**Problem**: Dates interpreted with wrong locale (US vs UK/TH)

**Disaster scenario**:
```
Data has: 01/12/2024
Intended: December 1, 2024 (UK/TH format: DD/MM/YYYY)
Power Query assumes: January 12, 2024 (US format: MM/DD/YYYY)
→ All dates wrong!
→ December becomes January, etc.
```

**How it happens**:
- Click column type dropdown → Date (WITHOUT "Using Locale")
- Power Query guesses locale based on system settings
- Often guesses wrong

**Prevention**:
- **ALWAYS** use "Using Locale" when setting date type
- Click Transform tab → Data Type → **Using Locale**
- Explicitly choose: English (United States), English (United Kingdom), Thai, etc.

**Detection**:
- Spot check a few dates after conversion
- If December 1 became January 12 → wrong locale
- Some dates may error if impossible (e.g., 13/05 in MM/DD)

**Recovery**:
- Delete the "Changed Type" step
- Redo with correct locale:
  - Transform → Data Type → Date → **Using Locale** → [Correct Locale]

**Why it happens**: Date formats are ambiguous, and defaults often wrong

---

### 6. Auto "Changed Type" With Hardcoded Columns

**Problem**: Power Query auto-generates "Changed Type" step when loading data, hardcoding column names

**Scenario**:
```
1. Load data with columns: Product, Jan, Feb, Mar
2. Auto "Changed Type" step created: Sets types for these 4 columns
3. You unpivot Jan, Feb, Mar → Now have Product, Month, Value
4. Auto "Changed Type" still references Jan, Feb, Mar
5. Refresh → Error! Jan, Feb, Mar don't exist anymore
```

**Detection**:
- Look at Applied Steps panel
- Find "Changed Type" step (usually first step)
- Look at formula bar: `= Table.TransformColumnTypes(#"Promoted Headers",{{"Jan", type number}, {"Feb", type number}...})`
- If it references columns you transformed → it's a problem

**Prevention**:
- After major transformations (unpivot, pivot, merge): Delete old "Changed Type"
- Re-apply data types to new structure
- New "Changed Type" step created with current columns

**Recovery**:
- Delete the problematic "Changed Type" step
- Select all columns
- Set correct types again
- For dates: Use "Using Locale"

**Why it happens**: Power Query tries to be helpful by auto-detecting types, but locks in column names

---

### 7. Sort Then Remove Duplicates (Loses Original Sort!)

**Problem**: Using "Remove Duplicates" after "Sort Rows" removes the sort

**What happens**:
```
1. Sort by Date (oldest to newest)
2. Remove Duplicates by Product (to keep first occurrence)
3. Result: Duplicates removed, but order is NOT by date anymore!
```

**Why**:
- "Remove Duplicates" doesn't preserve previous sort
- It uses its own internal ordering

**Solution**:
- If you need first/last by sort order:
  - Sort first
  - **Group By** with "All Rows" option
  - Take first row from each group
  - This preserves sort order

**Alternative**:
- Add an index column while sorted
- Remove duplicates
- Sort by index
- Remove index

**Why it happens**: Remove Duplicates is not sort-aware

---

### 9. Using Whole Number When Should Use Decimal

**Problem**: Setting numeric columns as Whole Number when they might have decimals in the future

**Scenario**:
```
Current data: Price = 100, 200, 300 (no decimals)
→ Auto-detect or user sets as Whole Number

Next month: Price = 150.50 (has decimals)
→ Query errors or truncates decimals → Data loss!
```

**Why it's bad**:
- Whole Number truncates/rejects decimal values
- Future data with decimals will error or lose precision
- Common in: Prices, amounts, rates, percentages, weights

**Best Practice**:
```
✅ Use Decimal Number by default for:
- Prices (99.99, 149.50)
- Amounts (1,234.56)
- Rates and percentages (98.5%)
- Weights/measurements (1.5 kg)
- Any monetary values

✅ Use Whole Number only when certain:
- Count of items (truly can't have decimals)
- IDs/codes (but Text is usually better)
```

**Principle**: 
> **"When in doubt, use Decimal Number"**
> - It's future-proof (supports decimals later)
> - It handles whole numbers fine (100 = 100.0)
> - Safer than risking data loss

**Prevention**:
- Default to Decimal Number for all numeric values
- Only use Whole Number when absolutely certain no decimals possible
- Even "quantity" might have decimals (0.5 kg sold)

**Recovery**:
- Change column type to Decimal Number
- Refresh query
- Check if any data was lost in previous runs

---

### 10. Unpivot Columns (The Forbidden One)

**Problem**: "Unpivot Columns" option creates weird formula that causes issues

**What's wrong**:
- Records formula differently than Other/Selected
- Can cause unexpected behavior later
- Power Query experts avoid it

**What to use instead**:
- "Unpivot Other Columns" - Select columns to keep fixed, unpivot the rest
- "Unpivot Only Selected Columns" - Select columns to unpivot specifically

**Why this matters**:
- Future-proofing
- Reliability
- Best practices from experts

**Rule**: ❌ Never use "Unpivot Columns" - Use Other or Selected instead

---

### 11. Merge vs Append Confusion

**Problem**: Using Merge when should Append, or vice versa

**Append (Union)**:
- Combines tables VERTICALLY (rows on top of rows)
- Use when: Same structure, different data (Jan + Feb sales, North + South regions)
- Requirement: Column names must match
- Example: 100 rows + 200 rows = 300 rows, same columns

**Merge (Join)**:
- Combines tables HORIZONTALLY (columns side by side)
- Use when: Related data, need to match rows (Customers + Orders)
- Requirement: Common key column
- Example: 100 rows + metadata = 100 rows, more columns

**Common mistakes**:
- Trying to append tables with different column names → Extra columns created
- Trying to merge tables that should be appended → One-to-many issues

**Solution**:
- Ask: "Same topic, more records?" → Append
- Ask: "Related topics, need to enrich?" → Merge

---

### 12. Transform Sample File Not Used Correctly

**Problem**: Making changes to main combined query instead of Transform Sample File

**Scenario**:
```
Combine Files from folder
→ Creates "Transform Sample File" and main query
→ User edits main query directly
→ Changes aren't applied to all files consistently
```

**Correct approach**:
1. Find "Transform Sample File" query
2. Make ALL transformations there (unpivot, remove rows, etc.)
3. Main query automatically applies to all files

**Why**:
- Transform Sample File = template
- Changes there apply to every file
- Changes in main query only affect combined result

**Detection**:
- If some files seem transformed differently
- Check if Transform Sample File was edited

---

## Mistakes by Experience Level

### Beginner Mistakes

**1. Trying to edit in Excel instead of Power Query**
- Problem: Make manual changes in Excel, then can't reproduce
- Solution: Put ALL data prep in Power Query for reproducibility

**2. Not knowing where source is**
- Problem: Can't refresh because don't know original file location
- Solution: Always document and use "true source" (not manually edited files)

**3. Clicking "Load" too early**
- Problem: Load messy data without transforming
- Solution: Always "Transform Data" first, inspect and clean, then Load

**4. Promoting headers at wrong time**
- Problem: Headers get treated as data or vice versa
- Solution: Check data carefully, promote headers after removing top rows if needed

**5. Not checking data types**
- Problem: Numbers stored as text, dates as text
- Solution: Always verify data types, use "Using Locale" for dates

---

### Intermediate Mistakes

**1. Hardcoded filters (Lazy Filter)**
- Covered above - use dynamic conditions instead

**2. Not future-proofing**
- Problem: Query works now but breaks when columns/values change
- Solution: Use "Other Columns", conditions, not hardcoded lists

**3. Combining files with mismatched columns**
- Problem: Case differences, spelling differences create duplicate columns
- Solution: Standardize at source or transform in sample file

**4. Ignoring Applied Steps**
- Problem: Don't understand what each step does
- Solution: Review steps, rename unclear ones, delete unnecessary ones

**5. Over-complicating transformations**
- Problem: Too many steps, convoluted logic
- Solution: Simplify, combine steps, use better UI options

---

### Advanced Mistakes

**1. Not optimizing query folding**
- Problem: Slow queries because transformations done locally instead of at database
- Solution: Understand query folding, keep folding steps at top

**2. Creating too many queries**
- Problem: Many single-use queries instead of reusable functions
- Solution: Create custom functions for repeated transformations

**3. Mixing M code and UI haphazardly**
- Problem: Hard to maintain, unclear logic
- Solution: Be consistent - mostly UI with M when needed, or structured M code

**4. Not handling errors**
- Problem: One bad row breaks entire query
- Solution: Use try...otherwise, remove errors, or handle specifically

---

## Recovery Strategies

### General Recovery Approach

**1. Check Applied Steps**
- Find where issue occurred
- Read formula bar to understand what step does

**2. Use "Go Back"**
- Click earlier step in Applied Steps
- See data at that point
- Identify exactly where it broke

**3. Delete problematic step**
- Right-click step → Delete
- Or edit formula bar directly

**4. Redo correctly**
- Reapply transformation the right way
- Verify results

**5. Test with new data**
- Add test record
- Refresh query
- Ensure it still works

---

### Specific Recoveries

**Problem: Wrong data types**
→ Delete "Changed Type" step → Re-apply correct types with "Using Locale" for dates

**Problem: Lost data after filter**
→ Delete filter step → Fill Down first → Re-apply filter

**Problem: Hardcoded values**
→ Delete filter step → Use "Remove Empty" or condition instead

**Problem: Columns not combining**
→ Check column names (case sensitivity!) → Rename to match → Combine again

**Problem: Dates wrong**
→ Delete date conversion → Use "Using Locale" with correct locale

**Problem: Query breaks on refresh**
→ Check for hardcoded column names in "Changed Type" → Delete → Re-apply

**Problem: Too slow**
→ Check if query folding works → Keep source-side operations at top → Limit local transformations

---

## Proactive Warning Phrases

Use these when guiding users:

**For Fill Down + Filter**:
> "⚠️ สำคัญมาก: ต้อง Fill Down ก่อน แล้วค่อย Filter นะคะ ถ้าทำกลับกัน Factory information จะหายไปเลย!"

**For Unpivot choice**:
> "⚠️ อย่าเลือก 'Unpivot Columns' นะคะ ใช้ 'Unpivot Other Columns' หรือ 'Unpivot Only Selected Columns' แทน เพราะวิธีบันทึกสูตรมันแปลกและอาจมีปัญหาภายหลัง"

**For Dates**:
> "⚠️ ตอนตั้ง data type เป็น Date ต้องใช้ 'Using Locale' เสมอนะคะ มิฉะนั้นวันที่อาจผิด! (01/12 อาจตีความเป็น Jan 12 หรือ Dec 1 ก็ได้)"

**For Case Sensitivity**:
> "⚠️ ระวังนะคะ Power Query เป็น case-sensitive แยก 'Sales' ≠ 'sales' ทุกที่เลย"

**For Auto Changed Type**:
> "⚠️ หลังจากทำ unpivot เสร็จ ให้ลบ 'Changed Type' step เก่าทิ้ง แล้วตั้ง data type ใหม่นะคะ ไม่งั้นจะ error ตอน refresh!"

**For Checkbox Filter**:
> "⚠️ ถ้าใช้ checkbox filter มันจะ hardcode ค่าที่เลือกไว้ ข้อมูลใหม่ที่ไม่ได้ check จะถูกกรองออกทันที ให้ใช้ 'Remove Empty' หรือ condition แทนจะดีกว่าค่ะ"

---

## Diagnostic Questions for Troubleshooting

**When user says "it's not working"**:

1. "ขั้นตอนไหนที่เกิด error คะ? เห็น message อะไรไหม?"
2. "ลอง refresh ดูแล้ว error หรือแค่ผลลัพธ์ไม่ตรงที่คาดหวังคะ?"
3. "พี่ทำอะไรไปก่อนหน้านี้บ้างคะ? (เพิ่มคอลัมน์? เปลี่ยน source?)"
4. "ลอง click ไปดู Applied Steps ที่ผ่านๆ มาแล้ว step ไหนที่ข้อมูลเริ่มผิดปกติคะ?"

**When troubleshooting together**:
1. Go through Applied Steps one by one
2. At each step, check data preview
3. Identify exactly where it breaks
4. Look at formula bar to understand what step does
5. Fix or redo that step

---

## Prevention > Recovery

**Best practices to avoid issues**:

1. **Use "Other Columns" or "Selected Columns" for unpivot** - Future-proof
2. **Always "Using Locale" for dates** - Prevent misinterpretation
3. **Delete auto "Changed Type" after major transforms** - Prevent hardcoded names
4. **Check case sensitivity in column names** - Especially when combining files
5. **Fill Down before Filter** - Preserve hierarchy
6. **Use dynamic filters** - Remove Empty, conditions, not checkbox lists
7. **Test with new data** - Verify query works with different inputs
8. **Document source** - Know where data really comes from
9. **Review Applied Steps** - Understand every step's purpose
10. **Keep it simple** - Fewer steps = fewer chances for errors

---

## Remember

**Good habits**:
- Check data at each step
- Understand formula bar
- Think about future data
- Test refresh with new data
- Document source and transformations

**When in trouble**:
- Don't panic
- Check Applied Steps
- Go back to last working step
- Redo correctly
- Learn from mistake

The goal: Build robust, reliable queries that handle real-world messiness! 💪✨

```

### references/examples.md

```markdown
# Power Query Examples - Before & After

## Table of Contents

1. [Overview](#overview)
2. [Example 1: Wide Format (Employee Sales)](#example-1-wide-format-employee-sales)
3. [Example 2: Stacked Metrics (Product + Payment Method)](#example-2-stacked-metrics-product--payment-method)
4. [Example 3: Grouped Data (Factory/Warehouse Hierarchy)](#example-3-grouped-data-factorywarehouse-hierarchy)
5. [Example 4: Multi-Row Headers](#example-4-multi-row-headers)
6. [Example 5: Date Locale Issues](#example-5-date-locale-issues)
7. [Pattern Summary](#pattern-summary)
8. [Remember](#remember)

---

## Overview

This file contains real-world examples of data transformation problems and their solutions. Each example shows:
- Before: Original problematic data
- Problems identified
- After: Proper structure
- Step-by-step solution
- Key lessons

Use these as reference when coaching users with similar problems.

---

## Example 1: Wide Format (Employee Sales)

### Before
```
| Product  | Jan | Feb | Mar | Apr |
|----------|-----|-----|-----|-----|
| Phone    | 100 | 150 | 200 | 120 |
|          | 80  | 90  | 110 | 95  |
| Laptop   | 200 | 220 | 250 | 230 |
|          | 180 | 190 | 210 | 200 |
```

### Problems Identified

1. **Wide format** 🔴
   - Months (Jan, Feb, Mar, Apr) are spread across columns
   - Should be: One "Month" column, one "Sales" column
   - Impact: Pivot Table will see 4 separate fields, can't filter by month easily

2. **Blank cells** 🟡
   - Product column has cells intentionally left blank for visual grouping
   - Only first row of each group has product name
   - Impact: Rows 2, 4 have blank product → data loss

### After (Proper Structure)
```
| Product | Month | Sales |
|---------|-------|-------|
| Phone   | Jan   | 100   |
| Phone   | Jan   | 80    |
| Phone   | Feb   | 150   |
| Phone   | Feb   | 90    |
| Phone   | Mar   | 200   |
| Phone   | Mar   | 110   |
| Phone   | Apr   | 120   |
| Phone   | Apr   | 95    |
| Laptop  | Jan   | 200   |
| Laptop  | Jan   | 180   |
... (and so on)
```

### Step-by-Step Solution

**Step 1: Fill Down (Fix merged cells)**
```
Action: Right-click "Product" column → Fill → Down
Result: All blank product cells now filled with value from above
```

**Step 2: Unpivot Other Columns (Convert wide → long)**
```
Action: 
1. Select "Product" column (the ID column to keep)
2. Transform tab → Unpivot Other Columns
Result: Jan, Feb, Mar, Apr converted to:
   - "Attribute" column (contains month names)
   - "Value" column (contains sales numbers)
```

**Step 3: Rename columns**
```
Action: 
- Double-click "Attribute" → rename to "Month"
- Double-click "Value" → rename to "Sales"
```

**Step 4: Set data types**
```
Action:
- Product: Text (already correct)
- Month: Text (already correct)
- Sales: Whole Number
```

### Why This Works

**Concept**: 
- Merged cells → Fill Down makes data explicit
- Wide format → Unpivot creates long format
- One column per concept (Product, Month, Sales)

**Future-proof**:
- If May is added → Automatically unpivoted (we used "Unpivot Other Columns")
- If new product added → Works automatically

**Analysis-ready**:
- Can filter by any month
- Can calculate by product
- Pivot Table has clear 3 fields
- Can create time series charts

### Key Lessons

1. **Always Fill Down before Unpivot** - Order matters!
2. **Use "Unpivot Other Columns"** - Future-proof for new months
3. **Wide → Long is the goal** - Not just single-row headers

---

## Example 2: Stacked Metrics (Product + Payment Method)

### Before
```
| Product | Payment | Attribute | Value |
|---------|---------|-----------|-------|
| Phone   | Cash    | Sales     | 1000  |
| Phone   | Cash    | Units     | 50    |
| Phone   | Card    | Sales     | 1200  |
| Phone   | Card    | Units     | 60    |
| Laptop  | Cash    | Sales     | 2000  |
| Laptop  | Cash    | Units     | 40    |
```

### Problems Identified

1. **Stacked metrics** 🟡
   - Sales and Units are in same column (Attribute)
   - Values are in single "Value" column
   - Should be: Separate Sales and Units columns
   - Impact: Hard to calculate (Sales / Units), inefficient in Pivot

### After (Proper Structure)
```
| Product | Payment | Sales | Units |
|---------|---------|-------|-------|
| Phone   | Cash    | 1000  | 50    |
| Phone   | Card    | 1200  | 60    |
| Laptop  | Cash    | 2000  | 40    |
```

### Step-by-Step Solution

**Step 1: Pivot Column (Separate metrics)**
```
Action:
1. Select "Attribute" column
2. Transform tab → Pivot Column
3. In dialog:
   - Values Column: "Value"
   - Advanced Options: Choose "Don't Aggregate"
4. Click OK
Result: "Attribute" values (Sales, Units) become column headers
```

**Step 2: Verify and set types**
```
Action:
- Product: Text
- Payment: Text
- Sales: Whole Number
- Units: Whole Number
```

### Why This Works

**Concept**:
- Pivot converts row values to column headers
- "Don't Aggregate" keeps values as-is
- Each metric (Sales, Units) gets own column

**Analysis-ready**:
- Can calculate Sales / Units easily
- Pivot Table has clear 4 fields
- Can sum/average each metric independently

### Key Lessons

1. **Pivot for stacked metrics** - When multiple measures in one column
2. **Don't Aggregate** - Critical! We want values as-is, not summed
3. **Separate topics** - Sales ≠ Units, deserve separate columns

---

## Example 3: Grouped Data (Factory/Warehouse Hierarchy)

### Before
```
| Factory/WH | TXID   | Buyer   | Product | Price | Qty |
|------------|--------|---------|---------|-------|-----|
| A          |        |         |         |       |     |
| WH-001     | TX0001 | sales ก | อาหาร   | 90    | 6   |
| WH-004     | TX0002 | sales ก | หนังสือ | 190   | 1   |
| WH-002     | TX0003 | sales ข | dvd หนัง | 399  | 3   |
| WH-003     | TX0004 | sales ก | อาหาร   | 90    | 4   |
| B          |        |         |         |       |     |
| WH-006     | TX0005 | sales ง | ของเล่น | 250   | 1   |
| WH-007     | TX0006 | sales ข | อาหาร   | 40    | 3   |
| WH-006     | TX0007 | sales ข | หนังสือ | 250   | 3   |
| C          |        |         |         |       |     |
| WH-008     | TX0008 | sales ง | ของเล่น | 250   | 2   |
| WH-010     | TX0009 | sales ง | ของเล่น | 250   | 1   |
| WH-009     | TX0010 | sales ข | ของเล่น | 190   | 1   |
```

### Problems Identified

1. **Grouped data (hidden hierarchy)** 🔴
   - Group headers (A, B, C) inserted as separate rows
   - Data rows (WH-001, WH-004, etc.) don't have factory information
   - Empty cells in all data columns for group header rows
   - Impact: Can't analyze by factory, hierarchy is implicit

2. **Mixed hierarchy levels** 🟡
   - Factory and Warehouse in same column
   - Need to separate for proper analysis

### After (Proper Structure)
```
| Factory | Warehouse | TXID   | Buyer   | Product | Price | Qty |
|---------|-----------|--------|---------|---------|-------|-----|
| A       | WH-001    | TX0001 | sales ก | อาหาร   | 90    | 6   |
| A       | WH-004    | TX0002 | sales ก | หนังสือ | 190   | 1   |
| A       | WH-002    | TX0003 | sales ข | dvd หนัง | 399  | 3   |
| A       | WH-003    | TX0004 | sales ก | อาหาร   | 90    | 4   |
| B       | WH-006    | TX0005 | sales ง | ของเล่น | 250   | 1   |
| B       | WH-007    | TX0006 | sales ข | อาหาร   | 40    | 3   |
| B       | WH-006    | TX0007 | sales ข | หนังสือ | 250   | 3   |
| C       | WH-008    | TX0008 | sales ง | ของเล่น | 250   | 2   |
| C       | WH-010    | TX0009 | sales ง | ของเล่น | 250   | 1   |
| C       | WH-009    | TX0010 | sales ข | ของเล่น | 190   | 1   |
```

### Step-by-Step Solution

**⚠️ CRITICAL ORDER: Fill Down → Create Factory Column → Fill Down Factory → Filter**

**Step 1: Fill Down (Make hierarchy explicit)**
```
Action: Right-click "Factory/WH" column → Fill → Down
Result: A fills down to WH-001, WH-004, WH-002, WH-003
        B fills down to WH-006, WH-007, WH-006
        C fills down to WH-008, WH-010, WH-009
        
Now every row has factory/warehouse information!
```

**Step 2: Separate hierarchy levels (Factory vs Warehouse)**

**💡 Best Method: Data-Driven Logic**

**Observation**:
- Factory header rows (A, B, C) → **No TXID** (null/empty)
- Warehouse data rows (WH-001, WH-004, etc.) → **Have TXID** (TX0001, TX0002, ...)

**This is the key!** Use data characteristics, not naming patterns.

```
Action:
1. Add Column tab → Conditional Column
2. Name: "Factory"
3. Condition: If [TXID] = null then [#"Factory/WH"] else null
   Note: Use [#"Factory/WH"] because column name has "/" special character!
4. This creates new "Factory" column with A, B, C where TXID is null, null elsewhere
```

**Why this is best**:
- ✅ Works with ANY factory name (A, AA, North, Site-01, anything!)
- ✅ Data-driven: checks if row has transaction data
- ✅ Future-proof: new factories automatically handled
- ✅ Clear logic: "No TXID = factory header"

**Less flexible alternatives**:
- `Text.Length([#"Factory/WH"]) = 1` - Only single characters
- `[#"Factory/WH"] = "A" or ... = "B" or ... = "C"` - Hardcoded

**Step 3: Fill Down the Factory column**
```
Action: Right-click "Factory" column → Fill → Down
Result: Now every row has factory information in the Factory column
```

**Step 4: ⚠️ Filter out header rows (DO NOT SKIP!)**
```
Action: 
1. Click "TXID" column
2. Click filter dropdown
3. Click "Remove Empty"
Result: Rows with only A, B, C (no TXID) are removed
        Only transaction rows remain

Critical: Without this step, you have duplicate/redundant header rows!
```

**Step 5: Rename and set types**
```
Action:
- Rename "Factory/WH" column to "Warehouse"
- Reorder columns: Factory, Warehouse, TXID, Buyer, Product, Price, Qty
- Set types:
  - Factory: Text
  - Warehouse: Text
  - TXID: Text
  - Buyer: Text
  - Product: Text
  - Price: Decimal Number (future-proof!)
  - Qty: Whole Number
```

### Why This Works

**Concept**:
- Fill Down converts implicit hierarchy (visual grouping) to explicit data
- Data-driven logic identifies which rows are headers vs data
- Filter removes now-useless header rows
- Result: clean hierarchy in every row

**Key principle**: Look at data characteristics (has TXID?) not naming patterns (text length)

**Future-proof**:
- New factories with any name → Works automatically
- New warehouses → Works automatically

**Analysis-ready**:
- Can analyze by Factory, Warehouse, or both
- Proper hierarchy for drill-down
- Every row has complete information
- Proper hierarchy for drill-down
- Every row has complete information

### Key Lessons

1. **⚠️ Fill Down BEFORE Filter** - Critical order! Filter first = lose hierarchy forever
2. **⚠️ Don't Forget to Filter** - After Fill Down, must remove header rows or you have duplicates
3. **Use Data-Driven Logic** - Check `if [TXID] = null` (data characteristics) not text length (patterns)
4. **M Code Special Characters** - Use `[#"Factory/WH"]` for column names with `/` or special characters
5. **Decimal for Amounts** - Use Decimal Number for Price (future-proof)

---

## Example 4: Multi-Row Headers

### Before
```
| Q1    | Q1    | Q2    | Q2    | Q3    | Q3    |
| Sales | Units | Sales | Units | Sales | Units |
|-------|-------|-------|-------|-------|-------|
| 1000  | 50    | 1200  | 60    | 1100  | 55    |
| 900   | 45    | 1150  | 58    | 1000  | 50    |
```

### Problems Identified

1. **Multi-row headers** 🟡
   - Headers span 2 rows (Quarter + Metric)
   - Power Query sees first row as data
   - Should be: Quarter, Sales, Units as separate columns

2. **Wide format** 🔴
   - Quarters spread across columns
   - Should be: One Quarter column

### After (Proper Structure)
```
| Quarter | Sales | Units |
|---------|-------|-------|
| Q1      | 1000  | 50    |
| Q1      | 900   | 45    |
| Q2      | 1200  | 60    |
| Q2      | 1150  | 58    |
| Q3      | 1100  | 55    |
| Q3      | 1000  | 50    |
```

### Step-by-Step Solution

#### Method 1: Transpose Method (For Simple Cases)

**Step 1: Keep only header rows**
```
Action: Home tab → Keep Rows → Keep Top Rows → Enter 2
Result: Only the 2 header rows remain
```

**Step 2: Transpose**
```
Action: Transform tab → Transpose
Result:
| Column1 | Column2 |
|---------|---------|
| Q1      | Sales   |
| Q1      | Units   |
| Q2      | Sales   |
| Q2      | Units   |
| Q3      | Sales   |
| Q3      | Units   |
```

**Step 3: Fill Down**
```
Action: Select Column1 → Right-click → Fill → Down
Result: All Q1 entries filled down, Q2 filled down, Q3 filled down
```

**Step 4: Merge columns**
```
Action: 
1. Select Column1, hold Ctrl, select Column2
2. Right-click → Merge Columns
3. Separator: Space
Result: "Q1 Sales", "Q1 Units", "Q2 Sales", "Q2 Units", "Q3 Sales", "Q3 Units"
```

**Step 5: Transpose back**
```
Action: Transform tab → Transpose
Result: Merged headers now in first row
```

**Step 6: Promote headers**
```
Action: Home tab → Use First Row as Headers
```

**Step 7: Delete old "Changed Type"**
```
Action: 
1. Find old "Changed Type" step in Applied Steps
2. Right-click → Delete
Reason: It has old column names, will break on refresh
```

**Step 8: Connect back to full data and unpivot**

Now you have: `Q1 Sales | Q1 Units | Q2 Sales | Q2 Units | Q3 Sales | Q3 Units`

But we need: `Quarter | Sales | Units`

**Step 9: Unpivot Other Columns** (assuming there's an ID column, if not, add index first)
```
Action: Select ID columns → Transform → Unpivot Other Columns
Result: Attribute (Q1 Sales, Q1 Units, ...) and Value columns
```

**Step 10: Split by delimiter**
```
Action: 
1. Select Attribute column
2. Transform → Split Column → By Delimiter → Space
Result: Two columns: Quarter (Q1, Q2, Q3) and Metric (Sales, Units)
```

**Step 11: Pivot the Metric**
```
Action:
1. Select Metric column
2. Transform → Pivot Column
3. Values Column: Value
4. Don't Aggregate
Result: Sales and Units as separate columns!
```

**Step 12: Final cleanup**
```
Action: Reorder columns, verify types
```

#### Method 2: Separate Header + Append (Reliable but Manual)

**When to use**: Transpose method is too complex or unreliable

**Approach**:
1. Manually define correct headers in separate query
2. Remove header rows from main data
3. Append correct headers to data

**Trade-off**: More manual, but always works

### Why This Works

**Concept**:
- Transpose converts columns↔rows, allowing manipulation of header structure
- Fill Down ensures every row has category information
- Merge combines multiple header rows
- Unpivot + Split + Pivot converts to proper long format

**Critical**:
- Must delete old "Changed Type" that has old column structure
- The goal isn't just single-row headers - it's **proper long format**

### Key Lessons

1. **Multi-row headers are complex** - Multiple steps needed
2. **Goal is structure, not just single row** - Must be Quarter | Sales | Units
3. **Delete old Changed Type** - Or query breaks on refresh
4. **Test with real data** - Transpose method can be tricky

---

## Example 5: Date Locale Issues

### Before
```
| Date       | Sales |
|------------|-------|
| 01/12/2024 | 100   |
| 02/01/2025 | 150   |
| 15/03/2024 | 200   |
```

### Problems Identified

1. **Ambiguous dates** 🔴
   - 01/12/2024 could be Jan 12 or Dec 1
   - US format (MM/DD/YYYY) vs UK/TH format (DD/MM/YYYY)
   - Stored as text, not Date type

### After (Proper Structure - Assuming Thai/UK Format)
```
| Date       | Sales |
|------------|-------|
| 2024-12-01 | 100   | <- December 1, 2024
| 2025-01-02 | 150   | <- January 2, 2025
| 2024-03-15 | 200   | <- March 15, 2024
```

### Step-by-Step Solution

**Step 1: Change type with locale**
```
Action:
1. Select Date column
2. Transform tab → Data Type dropdown → Date → Using Locale
3. Dialog appears: Choose "English (United Kingdom)" or "Thai"
4. Click OK
```

**Step 2: Verify**
```
Action: Spot check dates to ensure correct interpretation
Example: 01/12/2024 should become December 1 (not January 12)
```

### Why This Works

**Concept**:
- Locale tells Power Query HOW to interpret ambiguous dates
- Without locale, Power Query guesses (often wrong!)
- "Using Locale" explicitly specifies interpretation

**Critical**:
- **ALWAYS** use "Using Locale" when setting date types
- Never just click "Date" without specifying

### Key Lessons

1. **Always specify locale** - Even if it seems obvious
2. **Verify interpretation** - Check a few dates manually
3. **Standardize at source if possible** - Use ISO format (YYYY-MM-DD) to avoid ambiguity

---

## Pattern Summary

| Problem Type       | Key Transform       | Critical Warning                  |
|--------------------|---------------------|-----------------------------------|
| Wide Format        | Unpivot Other/Selected | Use Other/Selected, not "Unpivot Columns" |
| Merged Cells       | Fill Down           | Fill Down before other operations |
| Grouped Data       | Fill Down → Filter  | **ALWAYS** Fill Down before Filter! |
| Stacked Metrics    | Pivot Column        | Use "Don't Aggregate"             |
| Multi-row Headers  | Transpose + Fill + Merge | Delete old "Changed Type"         |
| Date Ambiguity     | Using Locale        | **ALWAYS** specify locale         |

---

## Remember

**Every example follows the same principles**:
1. **Structure first** - Fix headers and format before content
2. **One column = one concept** - Separate topics clearly
3. **Long format** - Not wide (unpivot when needed)
4. **Explicit data** - Every row complete (Fill Down)
5. **Future-proof** - Use "Other Columns", dynamic filters
6. **Verify types** - Especially dates with locale

The goal: Transform messy data into clean, analysis-ready format! 🎯✨

```

power-query-coaching | SkillHub