Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactoring Data Store Structure #882

Merged
merged 27 commits into from
Aug 22, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
27 commits
Select commit Hold shift + click to select a range
e1737bc
Initial Change of Structure of DataStore
Pushkar-Bhuse Jul 18, 2022
eb4acc9
Added new range annotation
Pushkar-Bhuse Jul 18, 2022
793d671
attribute_data type
Pushkar-Bhuse Jul 18, 2022
0af4edd
Minor fixes
Pushkar-Bhuse Jul 18, 2022
57f3c34
Cache based implementation
Pushkar-Bhuse Jul 29, 2022
7a85ac3
Merge branch 'master' into structure
Pushkar-Bhuse Jul 29, 2022
92e1e10
Grids removal
Pushkar-Bhuse Jul 29, 2022
25fe97d
Pylint Changes
Pushkar-Bhuse Jul 30, 2022
997e338
Removing dataclass default values
Pushkar-Bhuse Jul 31, 2022
adbc5a7
parent and child type fix + removing payload initialization
Pushkar-Bhuse Aug 1, 2022
6aa48ba
Docstrings and review changes
Pushkar-Bhuse Aug 4, 2022
37a4c4e
multipack group error fix
Pushkar-Bhuse Aug 4, 2022
1a4e3f2
Spell Checks
Pushkar-Bhuse Aug 4, 2022
06f02bf
Merge branch 'master' into structure
Pushkar-Bhuse Aug 15, 2022
1fcaebc
Merge fixes
Pushkar-Bhuse Aug 15, 2022
5d9eabd
Cache attribute dict structure modification
Pushkar-Bhuse Aug 16, 2022
4e1b123
Merge branch 'asyml:master' into structure
Pushkar-Bhuse Aug 16, 2022
466161a
test run fixes
Pushkar-Bhuse Aug 16, 2022
25937c5
Merge remote-tracking branch 'origin/structure' into structure
Pushkar-Bhuse Aug 16, 2022
d8cab15
Update base_pack.py
Pushkar-Bhuse Aug 16, 2022
daf912e
making data store validation and initialization more efficient
Pushkar-Bhuse Aug 17, 2022
95c84c0
Merge remote-tracking branch 'origin/structure' into structure
Pushkar-Bhuse Aug 17, 2022
bf4dbf3
test case fix
Pushkar-Bhuse Aug 17, 2022
67568f7
Merge branch 'master' into structure
mylibrar Aug 18, 2022
d402c3f
Minor changes
Pushkar-Bhuse Aug 22, 2022
9ce3e1c
Duplicate code
Pushkar-Bhuse Aug 22, 2022
c5b3af8
Changing parent child attribute type
Pushkar-Bhuse Aug 22, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
42 changes: 27 additions & 15 deletions forte/common/constants.py
Original file line number Diff line number Diff line change
@@ -1,15 +1,21 @@
# DataStore constants
# The index storing begin location in the internal entry data of DataStore.
BEGIN_INDEX = 0
# The name of the attribute storing the begin location in the internal
# entry data of DataStore.
BEGIN_ATTR_NAME = "begin"

# The index storing end location in the internal entry data of DataStore.
END_INDEX = 1
# The name of the attribute storing the end location in the internal
# entry data of DataStore.
END_ATTR_NAME = "end"

# The index storing tid in the internal entry data of DataStore.
TID_INDEX = 2
TID_INDEX = 0

# The index storing entry type in the internal entry data of DataStore.
ENTRY_TYPE_INDEX = 3
ENTRY_TYPE_INDEX = 1

# The name of the attribute storing the payload index location in the
# internal entry data of DataStore.
PAYLOAD_ID_ATTR_NAME = "payload_idx"

# The index storing entry type (specific to Link and Group type). It is saved
# in the `tid_idx_dict` in DataStore.
Expand All @@ -19,20 +25,26 @@
# in the `tid_idx_dict` in DataStore.
ENTRY_DICT_ENTRY_INDEX = 1

# The index storing parent entry tid in Link entries
PARENT_TID_INDEX = 0
# The name of the attribute storing the parent entry tid in Link entries
PARENT_TID_ATTR_NAME = "parent"

# The name of the attribute storing the parent entry type in Link entries
PARENT_TYPE_ATTR_NAME = "parent_type"

# The name of the attribute storing the child entry tid in Link entries
CHILD_TID_ATTR_NAME = "child"

# The index storing child entry tid in Link entries
CHILD_TID_INDEX = 1
# The name of the attribute storing the child entry type in Link entries
CHILD_TYPE_ATTR_NAME = "child_type"

# The index storing member entry type in Group entries
MEMBER_TYPE_INDEX = 0
# The name of the attribute storing the member entry type in Group entries
MEMBER_TYPE_ATTR_NAME = "member_type"

# The index storing the list of member entries tid in Group entries
MEMBER_TID_INDEX = 1
# The name of the attribute storing the list of member entries tid in Group entries
MEMBER_TID_ATTR_NAME = "members"

# The index where the first attribute appears in the internal entry data of DataStore.
ATTR_BEGIN_INDEX = 4
ATTR_BEGIN_INDEX = 2

# Name of the key to access the attribute dict of an entry type from
# ``_type_attributes`` of ``DataStore``.
Expand Down
81 changes: 77 additions & 4 deletions forte/data/base_pack.py
Original file line number Diff line number Diff line change
Expand Up @@ -436,12 +436,29 @@ def record_field(self, entry_id: int, field_name: str):
self._field_records[c] = {(entry_id, field_name)}

def on_entry_creation(
self, entry: Entry, component_name: Optional[str] = None
self,
entry: Entry,
component_name: Optional[str] = None,
):
"""
Call this when adding a new entry, will be called
in :class:`~forte.data.ontology.core.Entry` when
its `__init__` function is called.
its `__init__` function is called. This method does
the following 2 operations with regards to creating
a new entry.

- All ``dataclass`` attributes of the entry to be created
are stored in the class level dictionary of
:class:`~forte.data.ontology.core.Entry` called
``cached_attributes_data``. This is used to initialize
the corresponding entry's objects data store entry
- On creation of the data store entry, this methods associates
``getter`` and ``setter`` properties to all `dataclass`
attributes of this entry to allow direct interaction
between the attributes of the entry and their copy being
stored in the data store. For example, the `setter` method
updates the data store value of an attribute of a given entry
whenever the attribute in the entry's object is updated.

Args:
entry: The entry to be added.
Expand Down Expand Up @@ -541,16 +558,51 @@ def entry_getter(cls: Entry, attr_name: str):
def entry_setter(cls: Entry, value: Any, attr_name: str):
"""A setter function for dataclass fields of entry object.
When the value contains entry objects, we will convert them into
``tid``s before storing to ``DataStore``.
``tid``s before storing to ``DataStore``. Additionally, if the entry
setter method is called on an attribute that does not have a pack
associated with it (as is the case during intialization), the value
of the atttribute is stored in the class level cache of the ``Entry``
class. On the other hand, if a pack is associated with the entry,
the value will directly be stored in the data store.

Args:
cls: An ``Entry`` class object.
value: The value to be assigned to the attribute.
attr_name: The name of the attribute.
"""
attr_value: Any

try:
Pushkar-Bhuse marked this conversation as resolved.
Show resolved Hide resolved
pack = cls.pack
except AttributeError as err:
# This is the case when an object of an entry that has already been
# created before (which means an setter and getter properties are
# associated with its dataclass fields) is trying to be initialized.
# In this case, a pack is not yet associated with this entry. Thus,
# we store the initial values dataclass fields of such entries in the
# _cached_attribute_data of the Entry class.

# pylint: disable=protected-access
if cls.entry_type() not in Entry._cached_attribute_data:
Entry._cached_attribute_data[cls.entry_type()] = {}

if (
attr_name
not in Entry._cached_attribute_data[cls.entry_type()]
):
Entry._cached_attribute_data[cls.entry_type()][
attr_name
] = value
return
else:
raise KeyError(
"You are trying to overwrite the value "
f"of {attr_name} for a data store entry "
"before it is created."
Pushkar-Bhuse marked this conversation as resolved.
Show resolved Hide resolved
) from err

data_store_ref = (
cls.pack._data_store # pylint: disable=protected-access
pack._data_store # pylint: disable=protected-access
)

attr_type = data_store_ref.get_attr_type(
Expand Down Expand Up @@ -594,6 +646,27 @@ def entry_setter(cls: Entry, value: Any, attr_name: str):
tid=cls.tid, attr_name=attr_name, attr_value=attr_value
)

# If this is the first time an entry of this type is
# created, its attributes do not have a getter and setter
# property associated with them. We can thus assume that there
# no key in the _cached_attribute_data dictionary that has yet
# been created to store the dataclass fields of this entry. Thus,
# we create an empty dictionary to store the dataclass fields
# of this new entry and manually add all dataclass attributes
# that have been initialized to the _cached_attribute_data dict.
# We fetch the values of all dataclass fields by using the getattr
# method.

# pylint: disable=protected-access
if entry.entry_type() not in Entry._cached_attribute_data:
Entry._cached_attribute_data[entry.entry_type()] = {}
for name in entry.__dataclass_fields__:
attr_val = getattr(entry, name, None)
if attr_val is not None:
Entry._cached_attribute_data[entry.entry_type()][
name
] = attr_val

# Save the input entry object in DataStore
self._save_entry_to_data_store(entry=entry)

Expand Down
20 changes: 9 additions & 11 deletions forte/data/base_store.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,9 +13,8 @@
# limitations under the License.

from abc import abstractmethod
from typing import List, Iterator, Tuple, Any, Optional, Dict, Type
from typing import List, Iterator, Tuple, Any, Optional, Dict
import json
from forte.data.ontology.core import Entry

__all__ = ["BaseStore"]

Expand Down Expand Up @@ -128,10 +127,9 @@ def _deserialize(
def add_entry_raw(
self,
type_name: str,
attribute_data: List,
base_class: Type[Entry],
tid: Optional[int] = None,
allow_duplicate: bool = True,
attribute_data: Optional[List] = None,
) -> int:

r"""
Expand All @@ -143,19 +141,19 @@ def add_entry_raw(

Args:
type_name: The fully qualified type name of the new Entry.
attribute_data: It is a list that stores attributes relevant to
the entry being added. In order to keep the number of attributes
same for all entries, the list is populated with trailing None's.
base_class: The type of entry to add to the Data Store. This is
a reference to the class of the entry that needs to be added
to the DataStore. The reference can be to any of the classes
supported by the function.
tid: ``tid`` of the Entry that is being added.
It's optional, and it will be
auto-assigned if not given.
allow_duplicate: Whether we allow duplicate in the DataStore. When
it's set to False, the function will return the ``tid`` of
existing entry if a duplicate is found. Default value is True.
attribute_data: It is a `list` that stores attributes relevant to
the entry being added. The attributes passed in
`attributes_data` must be present in that entries
`type_attributes` and must only be those entries which are
relevant to the initialization of the entry. For example,
begin and end position when creating an entry of type
:class:`~forte.data.ontology.top.Annotation`.

Returns:
``tid`` of the entry.
Expand Down
9 changes: 5 additions & 4 deletions forte/data/data_pack.py
Original file line number Diff line number Diff line change
Expand Up @@ -465,7 +465,7 @@ def get_payload_at(

"""
supported_modality = [enum.name for enum in Modality]

payloads_length: int = 0
try:
# if modality.name == "text":
if modality == Modality.Text:
Expand Down Expand Up @@ -1402,6 +1402,7 @@ def _generate_link_entry_data(
a_dict["parent"].append(
np.where(data[parent_type]["tid"] == link.parent)[0][0]
)

a_dict["child"].append(
np.where(data[child_type]["tid"] == link.child)[0][0]
)
Expand Down Expand Up @@ -1631,13 +1632,13 @@ def _save_entry_to_data_store(self, entry: Entry):
self._entry_converter.save_entry_object(entry=entry, pack=self)

if isinstance(entry, Payload):
if entry.modality == Modality.Text:
if Modality.Text.name == entry.modality_name:
entry.set_payload_index(len(self.text_payloads))
self.text_payloads.append(entry)
elif entry.modality == Modality.Audio:
elif Modality.Audio.name == entry.modality_name:
entry.set_payload_index(len(self.audio_payloads))
self.audio_payloads.append(entry)
elif entry.modality == Modality.Image:
elif Modality.Image.name == entry.modality_name:
entry.set_payload_index(len(self.image_payloads))
self.image_payloads.append(entry)

Expand Down
Loading