Session API

Functions

cleanup(name, namespace[, persistent])

Destroys Table(s).

generateUniqueId()

Generates a unique ID each time it is invoked.

get_data_table(name, namespace)

return data table instance by table name and table name space

get_data_table_meta(key, data_table_name, …)

Gets meta keyed by key from meta table associated with table named data_table_name and namespaced data_table_namespace.

get_data_table_metas(data_table_name, …)

Gets metas from meta table associated with table named data_table_name and namespaced data_table_namespace.

get_session_id()

Returns session id.

init([job_id, eggroll_version, set_log_dir])

Initializes session, should be called before all.

parallelize(*args, **kwargs)

Transforms an existing iterable data into a Table.

save_data(kv_data, name, namespace[, …])

Saves data to table, optional add version.

save_data_table_meta(kv, data_table_name, …)

Saves metas(in kv) to meta table associated with the table named data_table_name and namespaced data_table_namespace.

stop()

Stops session, clean all tables associated with this session.

table(*args, **kwargs)

Loads an existing Table.

init(job_id=None, mode: int = <WorkMode.STANDALONE: 0>, backend: int = <Backend.EGGROLL: 0>, persistent_engine: str = 'LMDB', eggroll_version=None, set_log_dir=True, options: dict = None)

Initializes session, should be called before all.

Parameters
  • job_id (string) – job id and default table namespace of this runtime.

  • mode (WorkMode) –

    set work mode,

    • standalone: WorkMode.STANDALONE or 0

    • cluster: WorkMode.CLUSTER or 1

  • backend (Backend) –

    set computing backend,

    • eggroll: Backend.EGGROLL or 0

    • spark: Backend.SAPRK or 1

  • options (None or dict) – additional options

Returns

nothing returns

Return type

None

Examples

>>> from arch.api import session, WorkMode, Backend
>>> session.init("a_job_id", WorkMode.Standalone, Backend.EGGROLL)
table(*args, **kwargs)arch.api.base.table.Table

Loads an existing Table.

Parameters
  • name (string) – Table name of result Table.

  • namespace (string) – Table namespace of result Table.

  • partition (int) – Number of partitions when creating new Table.

  • create_if_missing (boolean) – Not implemented. Table will always be created if not exists.

  • error_if_exist (boolean) – Not implemented. No error will be thrown if already exists.

  • persistent (boolean) – Where to load the Table, True from persistent storage and False from temporary storage.

  • in_place_computing (boolean) – Whether in-place computing is enabled.

Returns

A Table consisting data loaded.

Return type

Table

Examples

>>> from arch.api import session
>>> a = session.table('foo', 'bar', persistent=True)
parallelize(*args, **kwargs)arch.api.base.table.Table

Transforms an existing iterable data into a Table.

Parameters
  • data (Iterable) – Data to be put.

  • include_key (boolean) – Whether to include key when parallelizing data into table.

  • name (string) – Table name of result Table. A default table name will be generated when None is used

  • partition (int) – Number of partitions when parallelizing data.

  • namespace (string) – Table namespace of result Table. job_id will be used when None is used.

  • create_if_missing (boolean) – Not implemented. Table will always be created.

  • error_if_exist (boolean) – Not implemented. No error will be thrown if already exists.

  • chunk_size (int) – Batch size when parallelizing data into Table.

  • in_place_computing (boolean) – Whether in-place computing is enabled.

Returns

A Table consisting of parallelized data.

Return type

Table

Examples

>>> from arch.api import session
>>> table = session.parallelize(range(10), in_place_computing=True)
cleanup(name, namespace, persistent=False)

Destroys Table(s). Wildcard can be used in name parameter.

Parameters
  • name (string) – Table name to be cleanup. Wildcard can be used here.

  • namespace (string) – Table namespace to be cleanup. This needs to be a exact match.

  • persistent (boolean) – Where to delete the Tables, True from persistent storage and False from temporary storage.

Returns

Return type

None

Examples

>>> from arch.api import session
>>> session.cleanup('foo*', 'bar', persistent=True)
generateUniqueId()

Generates a unique ID each time it is invoked.

Returns

uniqueId

Return type

string

Examples

>>> from arch.api import session
>>> session.generateUniqueId()
get_session_id()

Returns session id.

Returns

session id

Return type

string

Examples

>>> from arch.api import session
>>> session.get_session_id()
get_data_table(name, namespace)

return data table instance by table name and table name space

Parameters
  • name (string) – table name of data table

  • namespace (string) – table name space of data table

Returns

data table instance

Return type

DTable

Examples

>>> from arch.api import session
>>> session.get_data_table(name, namespace)
save_data_table_meta(kv, data_table_name, data_table_namespace)

Saves metas(in kv) to meta table associated with the table named data_table_name and namespaced data_table_namespace.

Parameters
  • kv (dict) – metas to save. v should be serialized by JSON

  • data_table_name (string) – table name of this data table

  • data_table_namespace (string) – table name of this data table

  • Returns

  • None

Examples

>>> from arch.api import session
>>> session.save_data_table_meta({"model_id": "a_id", "used_framework": "fate"}, "meta", "readme")
get_data_table_meta(key, data_table_name, data_table_namespace)

Gets meta keyed by key from meta table associated with table named data_table_name and namespaced data_table_namespace.

Parameters
  • key (string) – associated key.

  • data_table_name (string) – table name of this data table

  • data_table_namespace (string) – table name of this data table

Returns

object associated with key provieded

Return type

any

Examples

>>> from arch.api import session
>>> session.get_data_table_meta("model_id", "meta", "readme") # a_id
get_data_table_metas(data_table_name, data_table_namespace)

Gets metas from meta table associated with table named data_table_name and namespaced data_table_namespace.

Parameters
  • data_table_name (string) – table name of this data table

  • data_table_namespace (string) – table name of this data table

Returns

metas

Return type

dict

Examples

>>> from arch.api import session
>>> session.get_data_table_metas("meta", "readme") # {'model_id': 'a_id', 'used_framework': 'fate'}
save_data(kv_data: Iterable, name, namespace, partition=1, persistent: bool = True, create_if_missing=True, error_if_exist=False, in_version: bool = False, version_log=None)

Saves data to table, optional add version.

Parameters
  • kv_data (Iterable) – data to be saved

  • name (string) – table name

  • namespace (string) – table namespace

  • partition** (int) – Number of partitions when creating new Table.

  • create_if_missing (boolean) – Not implemented. Table will always be created if not exists.

  • error_if_exist (boolean) – Not implemented. No error will be thrown if already exists.

  • persistent (boolean) – Where to load the Table, True from persistent storage and False from temporary storage.

  • in_version (boolean) – add a version log or not

  • version_log (string) – log to be added

Returns

Return type

Table

Examples

>>> from arch.api import session
>>> session.save_data([("one", 1), ("two", 2)], "save_data", "readme", in_version=True, version_log="a version")
stop()

Stops session, clean all tables associated with this session.

Examples

>>> from arch.api import session
>>> session.stop()