Data Model — 6 dim G4 ontology + DB schema reference
개발자 / partner / 새 팀원이 SearchRight 회사 데이터 모델 (6 dim G4 cascade 결과) 의 DB schema 와 ontology 매핑을 빠르게 잡는 reference.
§1 회사 = 6 dim 구조화 속성
각 회사 row 는 6 차원 (G4 cascade 결과) 으로 표현. parser 가 자연어 query 에서 자동 추출, UI 필터 / API request body 둘 다 동일 schema.
| dim | DB columns | ontology source | normalize 의무 |
|---|---|---|---|
| scale | tier enum, total_funding_krw, employee_count, annual_revenue_krw, investment_stage enum, is_hiring | (enum) | — |
| target_market | customer_type enum, target_age_bands[], target_gender, service_regions[], is_global | (enum) | — |
| tech | backend_stack[], frontend_stack[], infra_stack[], uses_ai_ml bool, ai_ml_details[] | apps/web/src/lib/tech-stack-ontology.ts, apps/web/src/lib/ai-ml-detail-ontology.ts | parser + UI 둘 다 normalize 의무 (DB case 비일관 audit 박제) |
| bm (business model) | service_archetypes[], revenue_models[], revenue_models_v2[], service_type, sales_channel, has_offline_store, has_recommendation/search/display_curation | apps/web/src/lib/revenue-model-ontology.ts | revenue_models_v2 normalize |
| culture | org_structure[], dev_process[] | apps/web/src/lib/culture-ontology.ts | slug → DB 한글 토큰 expand |
| grounding | domains[] (slug 30+), narrow_segment (281 fine-grain) | apps/web/src/lib/segment-ontology.ts, docs/research/02-taxonomy.md | parser + manual mapping |
§2 enum / set 정의
tier (회사 단계, 7)
global_bigtech / domestic_bigtech / unicorn / enterprise / growth_stage / mid_sized / early_stage
SQL pool tier_weight CASE (sparsity prior):
global_bigtech2.0domestic_bigtech1.8unicorn1.5growth_stage1.2- (else) 1.0
JS final tier_weight = 1.0 (dead, M-tier-weight-audit — multiplier 복구 시 -2 hits).
customer_type (target_market)
b2b / b2c / b2b2c / b2g
b2b_saas domain proxy 활성 조건 (build-filter.ts fetchB2bSaasProxyIds):
customer_type=b2bORservice_archetypesincludessaas
service_archetypes[] (bm)
saas / marketplace / d2c / platform / app_service / commerce / ... (enum set)
revenue_models_v2[] (bm — G4 결과)
subscription_recurring / transaction_commission / ads_display / license / ...
parser normalize 의무 (revenue-model-ontology.ts) — 한국어 free-text → 영문 enum.
service_regions[] (target_market)
KR / GLOBAL / 미국 / EU / 일본 / ASEAN / ...
domains[] slug (grounding)
30+ slug 예: fintech / fintech_payment / fintech_lending / beauty_fmcg / b2b_saas / adtech / ai_deeptech / ...
docs/data/g5-testset.json 의 expected_ids 와 매칭.
narrow_segment (grounding)
281 fine-grain — fintech_payment_pg / airline_seat_booking / beauty_skincare_d2c / ...
ontology v1.6 source: apps/web/src/lib/segment-ontology.ts.
§3 검색 query 의 6 dim 매칭
자연어 query → parser → 6 dim 매칭 예시:
| query | 매칭된 dim |
|---|---|
| "광고 매출 메인 BM 인 한국 디지털 스타트업" | bm.revenue_models_v2 (ads_display) + target_market.service_regions (KR) + grounding.domains (digital) + scale.tier (early/growth) |
| "30~40대 여성 타겟 뷰티 D2C" | target_market.target_age_bands (thirties_to_forties) + target_market.target_gender (female) + bm.service_archetypes (d2c) + grounding.domains (beauty_fmcg) |
| "엔터프라이즈 SaaS 한국 unicorn" | scale.tier (unicorn) + target_market.customer_type (b2b) + bm.service_archetypes (saas) + service_regions (KR) |
build-filter.ts composeWhereClause 가 위 매칭을 drizzle ORM WHERE 절로 합성.
§4 G4 cascade 통계 (2026-05-22)
- enriched 7,023 회사 (LLM gpt-5.4-mini + WebSearch Phase 2)
- 6-dim 박제 5,624 row (Tier A + B + C 일부)
- recall@200 default 79.8% / opt-in
llm_rerank82.4% (11 query × 3 dim testset) - cost cumulative ~$367 (m-q-3b 후 $368.58)
상세 dimension scoreboard (m-q-3b 후):
- 도메인 차원: 85.8%
- BM 차원: 78.7% (G20)
- 타겟 차원: 81.1%
- 복합 차원: 76.0%
§5 DB schema 핵심 table
| table | 책임 | 핵심 column |
|---|---|---|
companies | 회사 기본 + 6 dim 구조화 속성 | id / name_ko / name_en / brand_name / aliases[] / tier / 6 dim columns (위 §1) / is_excluded_from_search / manually_corrected_at |
company_profiles | 회사 설명 prose + embedding | company_id / source enum (llm_enriched / manual / homepage / innoforest) / content text / embedding vector(1536) / status |
domains | domain slug 정의 | id / slug / name_ko |
company_domains | 회사 ↔ 도메인 N:N + primary flag | company_id / domain_id / is_primary |
search_history | 검색 호출 캐시 + audit | cache_key / normalized_query / filters_snapshot json / engine_flags_snapshot json / response json / feedback fields (PMF-2) |
corrections | F-R3 propose-correction 누적 | company_id / proposed_by / fields_changed json / applied_at |
api_keys + api_usage_log | H-2/H-3/H-5 외부 API | key_hash / scope / rate_limit_rpm / per-call usage |
users + sessions | NextAuth v5 (H-1) | email / password_hash / active / role |
drizzle schema source: packages/db/src/schema.ts.
§6 ontology drift 차단
새 ontology 추가 / 변경 시 의무:
apps/web/src/lib/*-ontology.tssource 갱신- parser SYSTEM_PROMPT (또는 결정론적 추출 layer
enforceDeterministicSignalExtraction) 동시 갱신 —feedback_parser_prompt_global_perturb(FP-15) 룰 docs/data/g5-testset.json의 expected_ids 영향도 측정- dual-call eval 게이트 통과 (m-q-3b)
§7 더 깊은 reference
- 6 dim 본래 정의 / G4 paradigm:
docs/research/02-taxonomy.md - segment ontology source:
apps/web/src/lib/segment-ontology.ts - 측정 SSOT:
docs/dev/eval-cycle.md - pipeline 11 module:
docs/dev/architecture.md §1 - 사용자 톤:
docs/user/how-it-works.md §2