Branch FolderCONCEPT

A "Branch Folder", in the context of ToGatherUp, is a subcategory within the Project Data Structure hierarchy that is nested within the Root Folder or another Branch Folder. It serves as a way to further organize and categorize resources within a project beyond the top-level Root Folder. Branch Folders can contain more specific folders tha represent more specific categories," which can help create a hierarchical structure that reflects the organization of the project's data. Resources can be assigned a Data Structure Position within a Branch Folder or any of its subcategories, allowing for efficient data management and organization, as well as logical and meaningful exportation of the data. In summary, Branch Folders provide an additional level of organization within the hierarchical categorization system used in ToGatherUp for managing and organizing project data.

Data EntryINTERFACE

The Data Entry is a ToGatherUp tool aimed at supporting researchers to input text files into their corpus projects. The tool helps ensure that the data entered is consistent and standardized, making it easier to manage, analyze, and retrieve the information later. For that aim, Data Entry presents fields for text files metadata determined by researcher in Project Metadata Settings. By using the Data Entry, researchers can quickly and easily enter text files into the corpus, making it easier to manage and analyze the data for linguistic research purposes.

Data ExportationINTERFACE

The Data Exportation tool in ToGatherUp allows users to export selected data from their project in a convenient and customized way. Users can configure the exported file's settings, such as including or excluding headers and selecting the desired character encoding, in the "Export Configuration" panel. Once the settings are configured, users can select the content they want to export and generate a compressed file containing the selected data.

Data ImportINTERFACE

Data Import is a ToGatherUp tool that allows users to import external data into their project. Before executing the import, users can review the "Data Structure and Files" panel to confirm that the imported file's data structure and files match their expectations. Once the import is executed, ToGatherUp registers the new data structure and adds the existing files in the imported file to the project, attempting to attribute metadata that reflects their respective positions in the data structure. It's essential to note that any previously created Data Structure will be replaced by the new Data Structure if it exists.

Data ManagerINTERFACE

Data Manager is the ToGatherUp interface that allows you to display a list of texts of your project in a organized and simple manner. Through this tool, you can search for specific texts using their metadata, making the location of the texts you need much easier and faster. With the Data Manager, you have access to all your texts in a centralized manner and can work with them more efficiently.

Data StructureINTERFACE

Data Structure is the ToGatherUp interface that allows you to visualize the hierarchical organization adopted in your project. A hierarchical folder structure is a method of organizing and arranging files and folders in a project. In ToGatherUp, files and folders are organized in a tree-like structure, which we named as the Project Data Structure, with the topmost folder being the "root" folder and each subsequent folder branching out from it. Each branch can contain additional subfolders, which can in turn contain more subfolders and so on. This allows for a logical and easy-to-navigate organization of files and folders, as well as a clear and efficient way to find and access specific files when they are exported.

Data Structure PositionMETADATA

A 'Data Structure Position' categorizes a resource by locating it within the project's hierarchy, called Project Data Structure in the context of ToGatherUp. This categorization facilitates efficient data management and organization and enhances the ability to export the data in a logical and meaningful way.

ToGatherUp is agnostic to the theoretical framework or approach adopted by researchers. To accomplish this, the tool works with the concepts of "Project Data Structure" and provides the metadata "Data Structure Position" that can be adapted to different theories or data categorization schemes.

Here's an example of how ToGatherUp's "Data Structure Position" and "Project Data Structure" concept could fit in a project that follows the Theory of Semantic Fields aimed at creating terminology for the field of Artificial Intelligence (AI):

Based on Gliozzo and Strapparava's (2009) explanation of the Theory of Semantic Fields:

[...] the lexicon is structured in clusters of very closely related concepts, lexicalized by sets of words. Word senses are determined and delimitated only by the meanings of other words in the same field. Such clusters of semantically related terms have been called Semantic Fields, and the theory explaining their properties is known as "The theory of Semantic Fields". This theory has been developed in the general framework of Saussure's structural semantics, whose basic claim is that a word meaning is determined by the "horizontal" paradigmatic and the "vertical" syntagmatic relations between that word and others in the whole language. Structural semantics is the predominant epistemological paradigm in linguistics, and it is very much appreciated in Computational Linguistic.

GLIOZZO, Alfio; STRAPPARAVA, Carlo. Semantic domains in computational linguistics. Springer Science & Business Media, 2009.

Let's say a team of terminologists is working on creating a terminology for Artificial Intelligence (AI) that can be used by researchers, academics, and practitioners in the field. To ensure that the terminology is comprehensive and well-organized, they want to organize publications about AI according to semantic fields.

They could create a Project Data Structure that has a top-level category (Root Folder) called "Artificial Intelligence." Within this category, they might create subcategories (Branch Folders) for specific aspects of AI, such as "Machine Learning," "Natural Language Processing," "Computer Vision," and "Expert Systems." Each subcategory could be further divided into Semantic Domains, such as "Supervised Learning," "Speech Recognition", "Image Segmentation" and "Rule-Based Systems."

The team could then assign a Data Structure Position to each publication based on its content to a Semantic Field defined in Project Data Structure. For instance, a research paper that explores a new technique for image segmentation might be assigned a Data Structure Position within the "Computer Vision" subcategory and the "Image Segmentation" domain.

By using Semantic Domains as the basis for the project's hierarchy and assigning Data Structure Positions to each resource, the team can efficiently categorize and export their data according to a logical and meaningful hierarchy within the context of their project.

File Name ConventionCONCEPT

File Name Convention refers to a set of rules that dictate the structure and composition of the name given to a file. This convention typically includes various informative segments, such as the creation date, identification code, author name, content area, version, and others, which are combined to provide enough information to identify the file's content from its name. The main purpose of the File Name Convention is to establish a consistent and organized way of naming files to facilitate their identification, location, and management, particularly in large and complex projects. A well-designed File Name Convention can also help to create subgroups of files or subcorpora within a larger corpus of documents, based on the metadata present in their names.

Understand how does File Name Conventions are used in ToGatherUp:

In ToGatherUp, project files are automatically named according to a convention established by the project leader or manager. The file name is composed of segments that represent the metadata defined in the convention. Each segment has three characters, with some exceptions*, that abbreviate the metadata associated with the file. The segments are separated by hyphens and the file name always ends with the .txt extension.

To illustrate, imagine that a researcher submitted a file to the "Data Entry" section of ToGatherUp and assigned the metadata "Position in Data Structure," "Target Audience," "Language," and "Publication Date" to it.

For the "Position in Data Structure" metadata, the researcher selected the option "Artificial Intelligence".
For the "Target Audience" metadata, he selected the option "Researchers".
For the "Language" metadata, he selected the option "English".
For the "Publication Date" metadata, he indicated March 29, 2023.

In the project settings, the researcher established a "File Naming Convention" including these same metadata in this order:

1 - Position in Data Structure
2 - Target Audience
3 - Language
4 - Publication Date

Based on these metadata and the convention, ToGatherUp named the file as follows: ART-RSH-ENG-29Mar23-34.txt

As we can see, ToGatherUp used the first three characters of "Artificial Intelligence" to create the abbreviation ART. For the "Target Audience" and "Language" metadata, it abbreviated "Researchers" as RSH and "English" as ENG. These abbreviations were pre-established in the internal settings of the tool. The "Publication Date" was included in the standardized format "29Mar23". Next, the number 34 was generated and automatically included by ToGatherUp. This number corresponds to the Internal Identifier (ID) of the file within the ToGatherUp database structure. The ID is mandatory for all files and plays the important role of ensuring that there are no files with similar names in the project. If the ID were not included in the file names, there would be a risk of files having identical names if their metadata were identical. Finally, ToGatherUp included the .txt extension in the file name.

Exceptions: Metadata related to dates and numbers are not abbreviated.

File Naming Convention ElementCONCEPT

A "File Naming Convention Element" refers to the specific metadata segments included in a File Naming Convention that are used to compose the name of a file. These elements typically include informative segments such as creation date, identification code, author and others that are combined to provide enough information to identify the file's content from its name. In ToGatherUp, each File Naming Convention element is represented by a three-character abbreviation, except for metadata related to dates and numbers. A well-designed set of File Naming Convention elements provides a consistent and organized way of naming files to facilitate their identification, location, and management, particularly in large and complex projects.

MetadataCONCEPT

Metadata is data that provides information about other data. Metadata provides descriptive information that helps to provide context, identify characteristics, and facilitate organization and retrieval of data.

Metadata MetricsCONCEPT

"Metadata Metrics" refers to quantitative measures used to evaluate specific characteristics of each type of metadata in a research project within ToGatherUp. These metrics may include information such as the quantity of files and words stored in each metadata, or for specific metadata such as "Processing Time" and "Number of Pages", the total time spent in processing and the total number of pages converted to plain text, respectively. Metadata metrics are important to evaluate the quality and efficiency of the process of collecting, preparing, and inserting data into the project, allowing researchers to monitor and optimize the use of each specific metadata. By tracking metadata metrics, researchers can ensure that the collected data is consistent, reliable, and suitable for future analysis.

In addition, "Metadata Metrics" is an important tool for visually tracking the evolution of text collection for a corpus in a research project within ToGatherUp. By analyzing the metrics, researchers can track the quantity of collected data as well as identify possible problems or gaps in the collection process, allowing for necessary adjustments to ensure the effectiveness and efficiency of the project. In this way, "Metadata Metrics" help verify whether data collection is within the criteria established for the research.

ProjectCONCEPT

A project is a temporary endeavor with a specific goal, set of tasks, and timeline, designed to produce a unique deliverable or outcome. Projects are typically planned and executed by teams, and may involve the use of resources such as time, people, technology, to be successful. Effective project management involves coordinating and managing these resources, as well as managing risks, resolving issues, and communicating with stakeholders to ensure that the project meets its objectives and delivers the desired outcome.

In the context of Corpus Linguistics research, a project can be defined as a systematic and structured effort to collect, process, and analyze a corpus of linguistic data. A corpus is a large and structured collection of texts that is used to study language use and patterns. Corpus linguistics research projects may involve the creation of a new corpus, the expansion of an existing corpus, or the analysis of a corpus to investigate a particular research question or hypothesis. A typical corpus linguistics research project may involve the following steps:

Defining the research question or hypothesis to be investigated.
Identifying and collecting a corpus of linguistic data that is relevant to the research question.
Processing the corpus, which may involve tasks such as cleaning, tagging, and parsing the data.
Analyzing the corpus using statistical or computational methods to identify patterns or relationships in the data.
Interpreting the results and drawing conclusions about the research question or hypothesis.

Effective project management is critical to the success of Corpus Linguistics research projects. ToGatherUp's platform provides an easy and effective way to create and manage high-quality research corpora that are easily retrievable for analysis, making it an integral component of Corpus Linguistics research.

Project CatalogINTERFACE

Project Catalog is the ToGatherUp interface that provides researchers with a view of all the projects they are involved in. This user-friendly interface allows researchers to easily access and manage their projects, view project details, track progress, and collaborate with team members. With the Project Catalog, researchers can streamline their work and stay organized, enabling them to focus on their research and achieve their goals more efficiently.

Project Data StructureCONCEPT

"Project Data Structure" is the hierarchical categorization system used in ToGatherUp to efficiently manage and organize data in a project. It is agnostic to the theoretical framework or approach adopted by researchers and can be adapted to different theories or data categorization schemes. ToGatherUp provides the metadata "Data Structure Position" to categorize resources within the Project Data Structure hierarchy.

Project LeaderCONCEPT

A Project Leader, In the context of ToGatherUp, is a researcher who creates a project within the ToGatherUp platform. The Project Leader is responsible for initiating the project, defining its scope and goals, and configuring it. Once the project is created, the Project Leader leads the project from start to finish, assembling and managing their project team, coordinating the efforts of team members, and ensuring that the project stays on track and within scope. The Project Leader communicates with stakeholders, team members, and senior management to keep them informed of the project's status and to make decisions that keep the project on track. Additionally, the Project Leader manages risks, resolves conflicts, and ensures that the project meets its objectives and delivers the expected results.

As ToGatherUp is suitable for both beginner and advanced users due to its user-friendly interface and advanced features, making it an ideal tool for anyone who values the importance of managing text data effectively, a Project Leader could be linguists, researchers or anyone who needs to manage and analyze large amounts of text data for research purposes.

Project ManagerCONCEPT

Project Manager is a person who assist the Project Leader in managing and executing the project.

Project MemberCONCEPT

A project member is a person who is part of a project team and contributes to the completion of a project. The role of a project member can vary depending on the nature of the project, but generally, they are responsible for specific tasks or objectives as defined by the project manager. Project members can have different levels of responsibility, expertise, and influence, but they all work together towards the common goal of completing the project successfully. Project members may be people of the same organization or from different organizations, and they may work in the same location or remotely.

Project OverviewCONCEPT

Project Overview is a comprehensive interface within ToGatherUp that provides researchers with a centralized view of all aspects of their project. This interface allows researchers to easily access and manage various data-related interfaces, including Data Entry, Data Manager, Data Structure, Data Exportation, Data Import, and their respective settings. In addition, the Project Overview interface also enables researchers to manage their team and view the identification and status of their project, including active metadata and file name conventions. Researchers can also track metrics for each project metadata through this interface, allowing for streamlined management and optimization of the research process. Overall, the Project Overview interface provides researchers with a powerful tool to effectively manage their projects and data, facilitating more efficient and productive research outcomes.

Root FolderCONCEPT

The "Root Folder", in the context of ToGatherUp, is the top-level category of the Project Data Structure hierarchy. It serves as the starting point for organizing and categorizing resources within a project. The Root Folder can contain subcategories (Branch Folders), which can further be divided into more specific categories to create a hierarchical structure in a tree-like structure, named as the Project Data Structure. Resources can be assigned a Data Structure Position within this hierarchy, which facilitates efficient data management and organization, and enhances the ability to export the data in a logical and meaningful way. In summary, the Root Folder is the highest level of the hierarchical categorization system used in ToGatherUp for organizing and managing project data.

Menu

Centro de Suporte

Glossário do ToGatherUp