Data in Two-dimensional Storage Structure

1 Data Objects

1.1 External Formats

Following objects can be edited in consistent way:

1.2 Storage Structure

Data are represented as two-dimensional storage structure:

2 Create Data

Data can be created by one of following ways:

New data are temporary data and need necessary parameters, like file name or table name, inputted when saved.

3 Open Data

Existed data can be opened by one of following ways:

4 Four Modes

4.1 Page data in html - read only

4.2 Table - edit

Except for texts file, multiple lines can be edited and saved to string values:

4.3 CSV - edit

4.4 Page data in texts - read only

5 Define Data

In mode "Table - edit" or mode "CSV - edit", click button "Define Data".

5.1 Interface of Columns Management

Under tab "Columns", add/delete/change columns in table view:

  1. Column names should not be null nor duplicated.

  2. Click table cell to edit value directly.

  3. Select one row and click button "Edit" to open edit window.

  4. Can rename all columns with sequence numbers.

  5. Can set random colors.

  6. Can adjust orders of columns.

  7. Click button "OK" to apply modifications of columns to "Table" of current data.

  8. Click button "Recover" to discard modifications of columns and pick data from "Table" of current data.

5.2 Types of Columns

  1. Types of columns include: String, Double, Float, Long, Integer, Short, Boolean, Datetime, Date, Era, Longitude, Latitude, Enumeration, Editable Enumeration, Color.

  2. This attribute is used to display, edit, calculate, and save data.

  3. Longitutde and Latitude are defined together generally.

5.3 Format of Column

  1. This attribute is mainly for display. When data are inputted/edited, formats are not applied automatically and original inputs are kept.

  2. In some interfaces, like "Copy" or "Export", options "Save date/time/era and numbers as columns' formats" can be checked.

5.3.1 Format for Numbers

To numbers, format can be: group in thousands, group in ten thousands, scientific notation, and no format.

5.3.2 Format for Datetime/Date

To Datetime/Date, following are supported in formats: MM/dd/yy, yy-MM-dd, milliseconds, time zone, T separator, patch century, etc..

5.3.3 Format for Era

To Era, following are supported in formats: MM/dd/yy, yy-MM-dd, milliseconds, time zone, T separator, patch century, prefix/suffix of "AD" and "BC" in Chinese and English,etc..

5.3.4 Define Enumeration

To Enumeration, list of values can be defined.

5.4 Handle Invalid Values

  1. How columns handle invalid values, including: skip, count as empty, and count as zero.

  2. In some context, "count as empty" equals to "skip".

  3. This attribute is only used for display or calculation. When data are inputted/edited, invalid values are not handled automatically.

5.5 Color of Column

Column color is mainly used for data charts.

When chart is generated, elements in it are displayed in colors as their columns' definitions. Then user can set chart in random colors.

5.6 Attributes of Data

Under tab "Attributes", set: data name, decimal scale, maximum value of random, and description.

6 Verify Data Values

  1. The verified objects can be either rows in current page or all rows.

  1. Following are checked:

  1. Options:

7 Edit Data

Principle of column usages is "Most tolerability and least manufacture".

7.1 Load Data

When load data, types of columns are not checked, and original values are read and imported.

7.2 Display Data

  1. Parse values as columns' types.

  2. Handle invalid values as columns' definitions.

  3. Rewrite values as columns' formats.

  4. Displayed values may be different from current actual values.

  5. Example, Column type is Double and value "abc" is read:

7.3 Controls for editing

7.4 Edit Data Cell

  1. Click editable cell to start editing.

  2. When editing is started, its original value is displayed, while both type and format of column are ignored.

  3. While user inputs and modifies the value, value in the edit control is checked by column type:

  1. Press key ENTER to commit modification, and press key ESC to cancel editing.

  2. Option: Auto-commit modification when cell loses focus(click other control).

  3. When modification is committed, value in the edit control is checked:

  1. Example, column type is Double, count invalid as empty, and decimal scale is 2. When read "abc":

  1. Other data cells are not affected. That is, data cells are always in originial values if they are not changed.

  1. Option: Validate data when edit. That is, data are not validated when edit if this option is not selected.

8 Save Data

8.1 Interface

8.2 Save Data File

Values are written into file of CSV/Texts/Excel as strings.

Option: Validate data when save. That is, data are not validated when save if this option is not selected.

8.3 Save Database Table

Values in database table are always validated.

Values are written into database table as nearest types:

Column Type of MyBox

Data Type of JDBC

String

VARCHAR

Double

DOUBLE

Float

FLOAT

Long

BIGINT

Integer

INT

Short

SMALLINT

Boolean

BOOLEAN

Datetime

TIMESTAMP

Date

DATE

Era

BIGINT

Longitude

DOUBLE

Latitude

DOUBLE

Enumeration

VARCHAR

Color

VARCHAR

Notice: derby does not support negative date, so Ear is saved as long.

8.4 Save Matrix

Values in matrix are always validated.

All values in Matrix are saved as Double.

9 Calculate Data

  1. Data are handled as original values, without concern about types and formats of columns.

  2. Values are parsed as need. Example, if the calculation requires double values, then try to convert values as doubles.

  3. If value conversion fails, then invalid value is handled as definition of column.

  4. Calculation itself can define how to handle invalid values. Definition of calculation priors to definition of column.

  5. Both column and calcultion can define decimal scale. Definition of calculation priors to definition of column.

  6. Example, Column type is String, and descriptive statistic is running against it:

10 Sort Data

  1. For any calculations involved in sorting, data will be translated into a temporary database table, and be sorted by database system.

  2. Results of sorting are related to columns' types. Example, string "124" is smaller than string "18", while number "124" is bigger than number "18".

11 Manage Data

MyBox records definition of data objects in its internal table:


csv/excel/text files

MyBox Clipboard

Matrice

Database tables

Time to create data definition

When data file is opened for the first time

When data are copied in MyBox Clipboard

When new matrix is saved in Matrices Manager

When new table is saved in Database Table Manager

Storage location of data

Data file

Files under MyBox internal path.

Database table of MyBox

Database tables of MyBox

When delete data definition

Data file is not affected

Internal file is deleted

Data of matrix are cleared

Database table is dropped


12 Data File

Data files are external data. MyBox records their definition and keeps their independence.

After read/write by MyBox, data files should be able to read/write by other ways properly.

12.1 CSV File

In CSV file:

When MyBox handles CSV file:

12.2 Excel File

In Excel file:

When MyBox handles Excel file:

Notice: Tool can only handle base data in Excel file. If file includes format, style, formula, or chart, suggest to save changes as new file to avoid data loss.

12.3 Texts File

In texts file:

When MyBox handles texts file:

13 Temporary Data File

Temporary data files are generated when:

Temporary data are saved in CSV format, under internal temporary files path of MyBox by default.

If files are under MyBox temporary files path:

Option: Save temporary data under "generated" path.
When this option is checked, temporary data files are not treated as "temporary files"。


More details can be referred in "User Guide - Data Tools"