Builder
These classes and methods make it easier to construct MetadataChangeProposals and MetadataChangeEvents.
- class datahub.emitter.mcp.MetadataChangeProposalWrapper(entityType='ENTITY_TYPE_UNSET', changeType='UPSERT', entityUrn=None, entityKeyAspect=None, auditHeader=None, aspectName=None, aspect=None, systemMetadata=None)
Bases:
object
- Parameters:
entityType (
str
)changeType (
Union
[str
,ChangeTypeClass
])entityUrn (
Optional
[str
])entityKeyAspect (
Optional
[_Aspect
])auditHeader (
Optional
[KafkaAuditHeaderClass
])aspectName (
Optional
[str
])aspect (
Optional
[_Aspect
])systemMetadata (
Optional
[SystemMetadataClass
])
-
entityType:
str
= 'ENTITY_TYPE_UNSET'
-
changeType:
Union
[str
,ChangeTypeClass
] = 'UPSERT'
-
entityUrn:
Optional
[str
] = None
-
entityKeyAspect:
Optional
[_Aspect
] = None
-
auditHeader:
Optional
[KafkaAuditHeaderClass
] = None
-
aspectName:
Optional
[str
] = None
-
aspect:
Optional
[_Aspect
] = None
-
systemMetadata:
Optional
[SystemMetadataClass
] = None
- classmethod construct_many(entityUrn, aspects)
- Parameters:
entityUrn (
str
)aspects (
Sequence
[Optional
[_Aspect
]])
- Return type:
- make_mcp()
- Return type:
- validate()
- Return type:
bool
- to_obj(tuples=False, simplified_structure=False)
- Parameters:
tuples (
bool
)simplified_structure (
bool
)
- Return type:
dict
- classmethod from_obj(obj, tuples=False)
Attempt to deserialize into an MCPW, but fall back to a standard MCP if we’re missing codegen’d classes for the entity key or aspect.
- Parameters:
obj (
dict
)tuples (
bool
)
- Return type:
Union
[MetadataChangeProposalWrapper
,MetadataChangeProposalClass
]
- classmethod try_from_mcpc(mcpc)
Attempts to create a MetadataChangeProposalWrapper from a MetadataChangeProposalClass. Neatly handles unsupported, expected cases, such as unknown aspect types or non-json content type.
- Raises:
Exception if the generic aspect is invalid, e.g. contains invalid json.
- Parameters:
mcpc (
MetadataChangeProposalClass
)- Return type:
Optional
[MetadataChangeProposalWrapper
]
- classmethod try_from_mcl(mcl)
- Parameters:
mcl (
MetadataChangeLogClass
)- Return type:
Union
[MetadataChangeProposalWrapper
,MetadataChangeProposalClass
]
- classmethod from_obj_require_wrapper(obj, tuples=False)
- Parameters:
obj (
dict
)tuples (
bool
)
- Return type:
- as_workunit(*, treat_errors_as_warnings=False, is_primary_source=True)
- Parameters:
treat_errors_as_warnings (
bool
)is_primary_source (
bool
)
- Return type:
MetadataWorkUnit
Convenience functions for creating MCEs
- datahub.emitter.mce_builder.set_dataset_urn_to_lower(value)
- Parameters:
value (
bool
)- Return type:
None
- class datahub.emitter.mce_builder.OwnerType(value)
Bases:
Enum
An enumeration.
- USER = 'corpuser'
- GROUP = 'corpGroup'
- datahub.emitter.mce_builder.get_sys_time()
- Return type:
int
- datahub.emitter.mce_builder.make_ts_millis(ts)
- Parameters:
ts (
Optional
[datetime
])- Return type:
Optional
[int
]
- datahub.emitter.mce_builder.parse_ts_millis(ts)
- Parameters:
ts (
Optional
[float
])- Return type:
Optional
[datetime
]
- datahub.emitter.mce_builder.make_data_platform_urn(platform)
- Parameters:
platform (
str
)- Return type:
str
- datahub.emitter.mce_builder.make_dataset_urn(platform, name, env='PROD')
- Parameters:
platform (
str
)name (
str
)env (
str
)
- Return type:
str
- datahub.emitter.mce_builder.make_dataplatform_instance_urn(platform, instance)
- Parameters:
platform (
str
)instance (
str
)
- Return type:
str
- datahub.emitter.mce_builder.make_dataset_urn_with_platform_instance(platform, name, platform_instance, env='PROD')
- Parameters:
platform (
str
)name (
str
)platform_instance (
Optional
[str
])env (
str
)
- Return type:
str
- datahub.emitter.mce_builder.make_schema_field_urn(parent_urn, field_path)
- Parameters:
parent_urn (
str
)field_path (
str
)
- Return type:
str
- datahub.emitter.mce_builder.schema_field_urn_to_key(schema_field_urn)
- Parameters:
schema_field_urn (
str
)- Return type:
Optional
[SchemaFieldKeyClass
]
- datahub.emitter.mce_builder.dataset_urn_to_key(dataset_urn)
- Parameters:
dataset_urn (
str
)- Return type:
Optional
[DatasetKeyClass
]
- datahub.emitter.mce_builder.dataset_key_to_urn(key)
- Parameters:
key (
DatasetKeyClass
)- Return type:
str
- datahub.emitter.mce_builder.make_container_urn(guid)
- Parameters:
guid (
Union
[str
,DatahubKey
])- Return type:
str
- datahub.emitter.mce_builder.container_urn_to_key(guid)
- Parameters:
guid (
str
)- Return type:
Optional
[ContainerKeyClass
]
- datahub.emitter.mce_builder.datahub_guid(obj)
- Parameters:
obj (
dict
)- Return type:
str
- datahub.emitter.mce_builder.make_assertion_urn(assertion_id)
- Parameters:
assertion_id (
str
)- Return type:
str
- datahub.emitter.mce_builder.assertion_urn_to_key(assertion_urn)
- Parameters:
assertion_urn (
str
)- Return type:
Optional
[AssertionKeyClass
]
- datahub.emitter.mce_builder.make_user_urn(username)
Makes a user urn if the input is not a user or group urn already
- Parameters:
username (
str
)- Return type:
str
- datahub.emitter.mce_builder.make_group_urn(groupname)
Makes a group urn if the input is not a user or group urn already
- Parameters:
groupname (
str
)- Return type:
str
- datahub.emitter.mce_builder.make_tag_urn(tag)
Makes a tag urn if the input is not a tag urn already
- Parameters:
tag (
str
)- Return type:
str
- datahub.emitter.mce_builder.make_owner_urn(owner, owner_type)
- Parameters:
owner (
str
)owner_type (
OwnerType
)
- Return type:
str
- datahub.emitter.mce_builder.make_ownership_type_urn(type)
- Parameters:
type (
str
)- Return type:
str
- datahub.emitter.mce_builder.make_term_urn(term)
Makes a term urn if the input is not a term urn already
- Parameters:
term (
str
)- Return type:
str
- datahub.emitter.mce_builder.make_data_flow_urn(orchestrator, flow_id, cluster='prod', platform_instance=None)
- Parameters:
orchestrator (
str
)flow_id (
str
)cluster (
str
)platform_instance (
Optional
[str
])
- Return type:
str
- datahub.emitter.mce_builder.make_data_job_urn_with_flow(flow_urn, job_id)
- Parameters:
flow_urn (
str
)job_id (
str
)
- Return type:
str
- datahub.emitter.mce_builder.make_data_process_instance_urn(dataProcessInstanceId)
- Parameters:
dataProcessInstanceId (
str
)- Return type:
str
- datahub.emitter.mce_builder.make_data_job_urn(orchestrator, flow_id, job_id, cluster='prod', platform_instance=None)
- Parameters:
orchestrator (
str
)flow_id (
str
)job_id (
str
)cluster (
str
)platform_instance (
Optional
[str
])
- Return type:
str
- datahub.emitter.mce_builder.make_dashboard_urn(platform, name, platform_instance=None)
- Parameters:
platform (
str
)name (
str
)platform_instance (
Optional
[str
])
- Return type:
str
- datahub.emitter.mce_builder.dashboard_urn_to_key(dashboard_urn)
- Parameters:
dashboard_urn (
str
)- Return type:
Optional
[DashboardKeyClass
]
- datahub.emitter.mce_builder.make_chart_urn(platform, name, platform_instance=None)
- Parameters:
platform (
str
)name (
str
)platform_instance (
Optional
[str
])
- Return type:
str
- datahub.emitter.mce_builder.chart_urn_to_key(chart_urn)
- Parameters:
chart_urn (
str
)- Return type:
Optional
[ChartKeyClass
]
- datahub.emitter.mce_builder.make_domain_urn(domain)
- Parameters:
domain (
str
)- Return type:
str
- datahub.emitter.mce_builder.make_ml_primary_key_urn(feature_table_name, primary_key_name)
- Parameters:
feature_table_name (
str
)primary_key_name (
str
)
- Return type:
str
- datahub.emitter.mce_builder.make_ml_feature_urn(feature_table_name, feature_name)
- Parameters:
feature_table_name (
str
)feature_name (
str
)
- Return type:
str
- datahub.emitter.mce_builder.make_ml_feature_table_urn(platform, feature_table_name)
- Parameters:
platform (
str
)feature_table_name (
str
)
- Return type:
str
- datahub.emitter.mce_builder.make_ml_model_urn(platform, model_name, env)
- Parameters:
platform (
str
)model_name (
str
)env (
str
)
- Return type:
str
- datahub.emitter.mce_builder.make_ml_model_deployment_urn(platform, deployment_name, env)
- Parameters:
platform (
str
)deployment_name (
str
)env (
str
)
- Return type:
str
- datahub.emitter.mce_builder.make_ml_model_group_urn(platform, group_name, env)
- Parameters:
platform (
str
)group_name (
str
)env (
str
)
- Return type:
str
- datahub.emitter.mce_builder.validate_ownership_type(ownership_type)
- Parameters:
ownership_type (
str
)- Return type:
Tuple
[str
,Optional
[str
]]
- datahub.emitter.mce_builder.make_lineage_mce(upstream_urns, downstream_urn, lineage_type='TRANSFORMED')
Note: this function only supports lineage for dataset aspects. It will not update lineage for any other aspect types.
- Parameters:
upstream_urns (
List
[str
])downstream_urn (
str
)lineage_type (
str
)
- Return type:
- datahub.emitter.mce_builder.can_add_aspect_to_snapshot(SnapshotType, AspectType)
- Parameters:
SnapshotType (
Type
[DictWrapper
])AspectType (
Type
[TypeVar
(Aspect
, bound=_Aspect
)])
- Return type:
bool
- datahub.emitter.mce_builder.can_add_aspect(mce, AspectType)
- Parameters:
mce (
MetadataChangeEventClass
)AspectType (
Type
[TypeVar
(Aspect
, bound=_Aspect
)])
- Return type:
bool
- datahub.emitter.mce_builder.assert_can_add_aspect(mce, AspectType)
- Parameters:
mce (
MetadataChangeEventClass
)AspectType (
Type
[TypeVar
(Aspect
, bound=_Aspect
)])
- Return type:
None
- datahub.emitter.mce_builder.get_aspect_if_available(mce, AspectType)
- Parameters:
mce (
MetadataChangeEventClass
)AspectType (
Type
[TypeVar
(Aspect
, bound=_Aspect
)])
- Return type:
Optional
[TypeVar
(Aspect
, bound=_Aspect
)]
- datahub.emitter.mce_builder.remove_aspect_if_available(mce, aspect_type)
- Parameters:
mce (
MetadataChangeEventClass
)aspect_type (
Type
[TypeVar
(Aspect
, bound=_Aspect
)])
- Return type:
bool
- datahub.emitter.mce_builder.get_or_add_aspect(mce, default)
- Parameters:
mce (
MetadataChangeEventClass
)default (
TypeVar
(Aspect
, bound=_Aspect
))
- Return type:
TypeVar
(Aspect
, bound=_Aspect
)
- datahub.emitter.mce_builder.make_global_tag_aspect_with_tag_list(tags)
- Parameters:
tags (
List
[str
])- Return type:
- datahub.emitter.mce_builder.make_ownership_aspect_from_urn_list(owner_urns, source_type, owner_type='DATAOWNER')
- Parameters:
owner_urns (
List
[str
])source_type (
Union
[str
,OwnershipSourceTypeClass
,None
])owner_type (
Union
[str
,OwnershipTypeClass
])
- Return type:
- datahub.emitter.mce_builder.make_glossary_terms_aspect_from_urn_list(term_urns)
- Parameters:
term_urns (
List
[str
])- Return type:
- datahub.emitter.mce_builder.set_aspect(mce, aspect, aspect_type)
Sets the aspect to the provided aspect, overwriting any previous aspect value that might have existed before. If passed in aspect is None, then the existing aspect value will be removed
- Parameters:
mce (
MetadataChangeEventClass
)aspect (
Optional
[TypeVar
(Aspect
, bound=_Aspect
)])aspect_type (
Type
[TypeVar
(Aspect
, bound=_Aspect
)])
- Return type:
None
- class datahub.emitter.mcp_builder.DatahubKey(**data)
Bases:
BaseModel
- Parameters:
data (
Any
)
- guid_dict()
- Return type:
Dict
[str
,str
]
- guid()
- Return type:
str
- model_config: ClassVar[ConfigDict] = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class datahub.emitter.mcp_builder.ContainerKey(**data)
Bases:
DatahubKey
Base class for container guid keys. Most users should use one of the subclasses instead.
- Parameters:
data (
Any
)platform (str)
instance (str | None)
env (str | None)
backcompat_env_as_instance (bool)
-
platform:
str
-
instance:
Optional
[str
]
-
env:
Optional
[str
]
-
backcompat_env_as_instance:
bool
- guid_dict()
- Return type:
Dict
[str
,str
]
- property_dict()
- Return type:
Dict
[str
,str
]
- as_urn()
- Return type:
str
- parent_key()
- Return type:
Optional
[ContainerKey
]
- model_config: ClassVar[ConfigDict] = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- datahub.emitter.mcp_builder.PlatformKey
alias of
ContainerKey
- class datahub.emitter.mcp_builder.DatabaseKey(**data)
Bases:
ContainerKey
- Parameters:
data (
Any
)platform (str)
instance (str | None)
env (str | None)
backcompat_env_as_instance (bool)
database (str)
-
database:
str
- model_config: ClassVar[ConfigDict] = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class datahub.emitter.mcp_builder.SchemaKey(**data)
Bases:
DatabaseKey
- Parameters:
data (
Any
)platform (str)
instance (str | None)
env (str | None)
backcompat_env_as_instance (bool)
database (str)
schema (str)
-
db_schema:
str
- model_config: ClassVar[ConfigDict] = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class datahub.emitter.mcp_builder.ProjectIdKey(**data)
Bases:
ContainerKey
- Parameters:
data (
Any
)platform (str)
instance (str | None)
env (str | None)
backcompat_env_as_instance (bool)
project_id (str)
-
project_id:
str
- model_config: ClassVar[ConfigDict] = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class datahub.emitter.mcp_builder.MetastoreKey(**data)
Bases:
ContainerKey
- Parameters:
data (
Any
)platform (str)
instance (str | None)
env (str | None)
backcompat_env_as_instance (bool)
metastore (str)
-
metastore:
str
- model_config: ClassVar[ConfigDict] = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class datahub.emitter.mcp_builder.CatalogKeyWithMetastore(**data)
Bases:
MetastoreKey
- Parameters:
data (
Any
)platform (str)
instance (str | None)
env (str | None)
backcompat_env_as_instance (bool)
metastore (str)
catalog (str)
-
catalog:
str
- model_config: ClassVar[ConfigDict] = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class datahub.emitter.mcp_builder.UnitySchemaKeyWithMetastore(**data)
Bases:
CatalogKeyWithMetastore
- Parameters:
data (
Any
)platform (str)
instance (str | None)
env (str | None)
backcompat_env_as_instance (bool)
metastore (str)
catalog (str)
unity_schema (str)
-
unity_schema:
str
- model_config: ClassVar[ConfigDict] = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class datahub.emitter.mcp_builder.CatalogKey(**data)
Bases:
ContainerKey
- Parameters:
data (
Any
)platform (str)
instance (str | None)
env (str | None)
backcompat_env_as_instance (bool)
catalog (str)
-
catalog:
str
- model_config: ClassVar[ConfigDict] = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class datahub.emitter.mcp_builder.UnitySchemaKey(**data)
Bases:
CatalogKey
- Parameters:
data (
Any
)platform (str)
instance (str | None)
env (str | None)
backcompat_env_as_instance (bool)
catalog (str)
unity_schema (str)
-
unity_schema:
str
- model_config: ClassVar[ConfigDict] = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class datahub.emitter.mcp_builder.BigQueryDatasetKey(**data)
Bases:
ProjectIdKey
- Parameters:
data (
Any
)platform (str)
instance (str | None)
env (str | None)
backcompat_env_as_instance (bool)
project_id (str)
dataset_id (str)
-
dataset_id:
str
- model_config: ClassVar[ConfigDict] = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class datahub.emitter.mcp_builder.FolderKey(**data)
Bases:
ContainerKey
- Parameters:
data (
Any
)platform (str)
instance (str | None)
env (str | None)
backcompat_env_as_instance (bool)
folder_abs_path (str)
-
folder_abs_path:
str
- model_config: ClassVar[ConfigDict] = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class datahub.emitter.mcp_builder.BucketKey(**data)
Bases:
ContainerKey
- Parameters:
data (
Any
)platform (str)
instance (str | None)
env (str | None)
backcompat_env_as_instance (bool)
bucket_name (str)
-
bucket_name:
str
- model_config: ClassVar[ConfigDict] = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class datahub.emitter.mcp_builder.NotebookKey(**data)
Bases:
DatahubKey
- Parameters:
data (
Any
)notebook_id (int)
platform (str)
instance (str | None)
-
notebook_id:
int
-
platform:
str
-
instance:
Optional
[str
]
- as_urn()
- Return type:
str
- model_config: ClassVar[ConfigDict] = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- datahub.emitter.mcp_builder.add_domain_to_entity_wu(entity_urn, domain_urn)
- Parameters:
entity_urn (
str
)domain_urn (
str
)
- Return type:
Iterable
[MetadataWorkUnit
]
- datahub.emitter.mcp_builder.add_owner_to_entity_wu(entity_type, entity_urn, owner_urn)
- Parameters:
entity_type (
str
)entity_urn (
str
)owner_urn (
str
)
- Return type:
Iterable
[MetadataWorkUnit
]
- datahub.emitter.mcp_builder.add_tags_to_entity_wu(entity_type, entity_urn, tags)
- Parameters:
entity_type (
str
)entity_urn (
str
)tags (
List
[str
])
- Return type:
Iterable
[MetadataWorkUnit
]
- datahub.emitter.mcp_builder.add_structured_properties_to_entity_wu(entity_urn, structured_properties)
- Parameters:
entity_urn (
str
)structured_properties (
Dict
[StructuredPropertyUrn
,str
])
- Return type:
Iterable
[MetadataWorkUnit
]
- datahub.emitter.mcp_builder.gen_containers(container_key, name, sub_types, parent_container_key=None, extra_properties=None, structured_properties=None, domain_urn=None, description=None, owner_urn=None, external_url=None, tags=None, qualified_name=None, created=None, last_modified=None)
- Parameters:
container_key (
TypeVar
(KeyType
, bound=ContainerKey
))name (
str
)sub_types (
List
[str
])parent_container_key (
Optional
[ContainerKey
])extra_properties (
Optional
[Dict
[str
,str
]])structured_properties (
Optional
[Dict
[StructuredPropertyUrn
,str
]])domain_urn (
Optional
[str
])description (
Optional
[str
])owner_urn (
Optional
[str
])external_url (
Optional
[str
])tags (
Optional
[List
[str
]])qualified_name (
Optional
[str
])created (
Optional
[int
])last_modified (
Optional
[int
])
- Return type:
Iterable
[MetadataWorkUnit
]
- datahub.emitter.mcp_builder.add_dataset_to_container(container_key, dataset_urn)
- Parameters:
container_key (
TypeVar
(KeyType
, bound=ContainerKey
))dataset_urn (
str
)
- Return type:
Iterable
[MetadataWorkUnit
]
- datahub.emitter.mcp_builder.add_entity_to_container(container_key, entity_type, entity_urn)
- Parameters:
container_key (
TypeVar
(KeyType
, bound=ContainerKey
))entity_type (
str
)entity_urn (
str
)
- Return type:
Iterable
[MetadataWorkUnit
]
- datahub.emitter.mcp_builder.mcps_from_mce(mce)
- Parameters:
mce (
MetadataChangeEventClass
)- Return type:
Iterable
[MetadataChangeProposalWrapper
]
- datahub.emitter.mcp_builder.create_embed_mcp(urn, embed_url)
- Parameters:
urn (
str
)embed_url (
str
)
- Return type:
- datahub.emitter.mcp_builder.entity_supports_aspect(entity_type, aspect_type)
- Parameters:
entity_type (
str
)aspect_type (
Type
[TypeVar
(Aspect
, bound=_Aspect
)])
- Return type:
bool