Workspace/docs
개발자 reference · Reference
Updated · 2026-05-26
Edit on GitHub ↗

Data Model — 6 dim G4 ontology + DB schema reference

개발자 / partner / 새 팀원이 SearchRight 회사 데이터 모델 (6 dim G4 cascade 결과) 의 DB schema 와 ontology 매핑을 빠르게 잡는 reference.

사용자 톤은 docs/user/how-it-works.md §2 검색은 무엇을 보고.


§1 회사 = 6 dim 구조화 속성

각 회사 row 는 6 차원 (G4 cascade 결과) 으로 표현. parser 가 자연어 query 에서 자동 추출, UI 필터 / API request body 둘 다 동일 schema.

dimDB columnsontology sourcenormalize 의무
scaletier enum, total_funding_krw, employee_count, annual_revenue_krw, investment_stage enum, is_hiring(enum)
target_marketcustomer_type enum, target_age_bands[], target_gender, service_regions[], is_global(enum)
techbackend_stack[], frontend_stack[], infra_stack[], uses_ai_ml bool, ai_ml_details[]apps/web/src/lib/tech-stack-ontology.ts, apps/web/src/lib/ai-ml-detail-ontology.tsparser + UI 둘 다 normalize 의무 (DB case 비일관 audit 박제)
bm (business model)service_archetypes[], revenue_models[], revenue_models_v2[], service_type, sales_channel, has_offline_store, has_recommendation/search/display_curationapps/web/src/lib/revenue-model-ontology.tsrevenue_models_v2 normalize
cultureorg_structure[], dev_process[]apps/web/src/lib/culture-ontology.tsslug → DB 한글 토큰 expand
groundingdomains[] (slug 30+), narrow_segment (281 fine-grain)apps/web/src/lib/segment-ontology.ts, docs/research/02-taxonomy.mdparser + manual mapping

§2 enum / set 정의

tier (회사 단계, 7)

global_bigtech / domestic_bigtech / unicorn / enterprise / growth_stage / mid_sized / early_stage

SQL pool tier_weight CASE (sparsity prior):

  • global_bigtech 2.0
  • domestic_bigtech 1.8
  • unicorn 1.5
  • growth_stage 1.2
  • (else) 1.0

JS final tier_weight = 1.0 (dead, M-tier-weight-audit — multiplier 복구 시 -2 hits).

customer_type (target_market)

b2b / b2c / b2b2c / b2g

b2b_saas domain proxy 활성 조건 (build-filter.ts fetchB2bSaasProxyIds):

  • customer_type=b2b OR service_archetypes includes saas

service_archetypes[] (bm)

saas / marketplace / d2c / platform / app_service / commerce / ... (enum set)

revenue_models_v2[] (bm — G4 결과)

subscription_recurring / transaction_commission / ads_display / license / ... parser normalize 의무 (revenue-model-ontology.ts) — 한국어 free-text → 영문 enum.

service_regions[] (target_market)

KR / GLOBAL / 미국 / EU / 일본 / ASEAN / ...

domains[] slug (grounding)

30+ slug 예: fintech / fintech_payment / fintech_lending / beauty_fmcg / b2b_saas / adtech / ai_deeptech / ...

docs/data/g5-testset.json 의 expected_ids 와 매칭.

narrow_segment (grounding)

281 fine-grain — fintech_payment_pg / airline_seat_booking / beauty_skincare_d2c / ...

ontology v1.6 source: apps/web/src/lib/segment-ontology.ts.


§3 검색 query 의 6 dim 매칭

자연어 query → parser → 6 dim 매칭 예시:

query매칭된 dim
"광고 매출 메인 BM 인 한국 디지털 스타트업"bm.revenue_models_v2 (ads_display) + target_market.service_regions (KR) + grounding.domains (digital) + scale.tier (early/growth)
"30~40대 여성 타겟 뷰티 D2C"target_market.target_age_bands (thirties_to_forties) + target_market.target_gender (female) + bm.service_archetypes (d2c) + grounding.domains (beauty_fmcg)
"엔터프라이즈 SaaS 한국 unicorn"scale.tier (unicorn) + target_market.customer_type (b2b) + bm.service_archetypes (saas) + service_regions (KR)

build-filter.ts composeWhereClause 가 위 매칭을 drizzle ORM WHERE 절로 합성.


§4 G4 cascade 통계 (2026-05-22)

  • enriched 7,023 회사 (LLM gpt-5.4-mini + WebSearch Phase 2)
  • 6-dim 박제 5,624 row (Tier A + B + C 일부)
  • recall@200 default 79.8% / opt-in llm_rerank 82.4% (11 query × 3 dim testset)
  • cost cumulative ~$367 (m-q-3b 후 $368.58)

상세 dimension scoreboard (m-q-3b 후):

  • 도메인 차원: 85.8%
  • BM 차원: 78.7% (G20)
  • 타겟 차원: 81.1%
  • 복합 차원: 76.0%

§5 DB schema 핵심 table

table책임핵심 column
companies회사 기본 + 6 dim 구조화 속성id / name_ko / name_en / brand_name / aliases[] / tier / 6 dim columns (위 §1) / is_excluded_from_search / manually_corrected_at
company_profiles회사 설명 prose + embeddingcompany_id / source enum (llm_enriched / manual / homepage / innoforest) / content text / embedding vector(1536) / status
domainsdomain slug 정의id / slug / name_ko
company_domains회사 ↔ 도메인 N:N + primary flagcompany_id / domain_id / is_primary
search_history검색 호출 캐시 + auditcache_key / normalized_query / filters_snapshot json / engine_flags_snapshot json / response json / feedback fields (PMF-2)
correctionsF-R3 propose-correction 누적company_id / proposed_by / fields_changed json / applied_at
api_keys + api_usage_logH-2/H-3/H-5 외부 APIkey_hash / scope / rate_limit_rpm / per-call usage
users + sessionsNextAuth v5 (H-1)email / password_hash / active / role

drizzle schema source: packages/db/src/schema.ts.


§6 ontology drift 차단

새 ontology 추가 / 변경 시 의무:

  1. apps/web/src/lib/*-ontology.ts source 갱신
  2. parser SYSTEM_PROMPT (또는 결정론적 추출 layer enforceDeterministicSignalExtraction) 동시 갱신 — feedback_parser_prompt_global_perturb (FP-15) 룰
  3. docs/data/g5-testset.json 의 expected_ids 영향도 측정
  4. dual-call eval 게이트 통과 (m-q-3b)

상세: docs/dev/eval-cycle.md.


§7 더 깊은 reference