Data in Two-dimensional Storage Structure

1 Data Objects

1.1 External Formats

Following objects can be edited in consistent way:

Data files, including CSV File, Excel file, texts file.
Data of MyBox Clipboard
Matrices
Database tables.

1.2 Storage Structure

Data are represented as two-dimensional storage structure:

"Columns" define attributes/fields of data and extend data dimensions in horizontal direction.
"Rows" store instances of data and extend data set in vertical direction.
Data should be in same width. That is all rows have equal number of columns.

2 Create Data

Data can be created by one of following ways:

Click or hover button "Create Data", and select data type: CSV, Excel, Texts, MyBox Clipboard, Matrix, Database Table.
Click button "Load contents in System Clipboard", and select rows and columns.
Click or hover button "Examples", and select example data. Option "Only import definition".

New data are temporary data and need necessary parameters, like file name or table name, inputted when saved.

3 Open Data

Existed data can be opened by one of following ways:

Click or hover button "Select File" to load data in extrenal data file.
Click button "Select"(CTRL+T or ALT+T) to load data managed by MyBox.

4 Four Modes

4.1 Page data in html - read only

Display data of current page in html.
Can not edit data.
Can modify or delete data in batch with menu function.
Can modify data definition.

4.2 Table - edit

Display data of current page in table.
Can edit data.
Can modify or delete data in batch with menu function.
Can modify data definition.

Except for texts file, multiple lines can be edited and saved to string values:

When the value is single line(not include line break), text field is shown when click the data cell; write "\n" as line break in the value and commit the change(return or click other place).
When the value includes line breaks: Text field is shown when click the data cell; Write the text in multiple lines directly.

4.3 CSV - edit

Display data of current page in CSV format.
Can modify data.
Can modify or delete data in batch with menu function.
Can modify data definition.
Click button "Delimiter" to reload data and apply different delimiter. This delimiter does not affect source file.
If a value contains delimiter or line break, the value should be surrounded by quotes.

4.4 Page data in texts - read only

Display data of current page in texts format.
Can not edit data.
Can modify or delete data in batch with menu function.
Can modify data definition.
Click button "Delimiter" to reload data and apply different delimiter. This delimiter does not affect source file.

5 Define Data

Click button or menu "Define Data".

5.1 Interface of Columns Management

Under tab "Columns", add/delete/change columns in table view:

Column names should not be null nor duplicated.
Click table cell to edit value directly.
Double click column, or select column and click button "Edit", to open eidt-column window:
- Click button "OK" to save modification of current column.
- Click button "Recover" to load original definition of the column.
- Click button "Select" to copy column definition from tree "Data Column".
- Click button "Save" to write current column definition in tree "Data Column".
Can rename all columns with sequence numbers.
Can set random colors.
Can adjust orders of columns.
Click button "OK" to apply modifications of columns to "Table" of current data.
Click button "Recover" to discard modifications of columns and pick data from "Table" of current data.
Click button "Select" copy column definitions from tree "Data Column".
Click button "Import" to load column definitions from XML file.
Click button "Export" to write column definitions as file. Supported formats include: XML/JSON/CSV/Excel.

5.2 Types of Columns

Types of columns include: String, Double, Float, Long, Integer, Short, Boolean, Short Boolean, Datetime, Date, Era, Longitude, Latitude, Enumeration, Editable Enumeration, Enumerated Short, Color.
This attribute is used to display, edit, calculate, and save data.
Longitutde and Latitude are defined together generally.

5.3 Format of Column

This attribute is mainly for display. When data are inputted/edited, formats are not applied automatically and original inputs are kept.
In some interfaces, like "Copy" or "Export", options "Save date/time/era and numbers as columns' formats" can be checked.

5.3.1 Format for Numbers

To numbers, format can be: group in thousands, group in ten thousands, scientific notation, and no format.

5.3.2 Format for Datetime/Date

To Datetime/Date, following are supported in formats: MM/dd/yy, yy-MM-dd, milliseconds, time zone, T separator, patch century, etc..

5.3.3 Format for Era

To Era, following are supported in formats: MM/dd/yy, yy-MM-dd, milliseconds, time zone, T separator, patch century, prefix/suffix of "AD" and "BC" in Chinese and English,etc..

5.3.4 Define Enumeration

To Enumeration, list of values can be defined.

5.3.5 Enumerated Short

Displayed as String, while saved as Short.

5.5 Color of Column

Column color is mainly used for data charts.

When chart is generated, elements in it are displayed in colors as their columns' definitions. Then user can set chart in random colors.

5.6 Attributes of Data

Under tab "Attributes", set: data name, decimal scale, maximum value of random, and description.

6 Verify Data Values

The verified objects can be either rows in current page or all rows.

Following are checked:
- If a column is defined as "not null", then null values are invalid for this column.
- If a value is not satisfied with the column type, then the value is invalid.

Handle invalid value when edit:
- Reject invalid value, and recover original value.
- To data file, there is option to not reject invalid value and it is accepted while shown in abnormal color.
- To matrix or database table, invalid value is always rejected.

Handle invalid value when save:
- Reject invalid value, and fail to save.
- To data file, there is option to not reject invalid value and it is written in file.
- To matrix or database table, invalid value is always rejected.

Handle invalid value in other operations:
- Options to choice: Fail, Use, Skip, Count as null, Count as zero.
- If select "Fail", operation fails when invalid value is found.
- If select "Use", invalid value is taken and applied in operation.
- If select "Skip", involved column or row is ignored and does not take part in operation.
- If select "Count as null", invalid value is replaced with null and take part in operation.
- If select "Count as zero", invalid value is replaced with zero and take part in operation.
- In some context, "Count as null" equals to "Skip".

7 Edit Data

Principle of column usages is "Most tolerability and least manufacture".

7.1 Load Data

When load data, types of columns are not checked, and original values are read and imported.

7.2 Display Data

Parse values as columns' types.
Rewrite values as columns' formats.
Displayed invalid values in abnormal color.

7.3 Controls for editing

To Boolean, checkbox is provided.
To Enumeration, list view is provided with selections.
To Color, palette is provided.
To Longitude and Latitude, map can be popped to locate coordinate.

7.4 Edit Data Cell

Click editable cell to start editing.
When editing is started, its original value is displayed, while both type and format of column are ignored.
While user inputs and modifies the value, value in the edit control is checked by column type:
- If value is invalid, then edit contorl is displayed in abnormal color.
- If value is valid, then edit control is displayed in normal color.
- The value is always kept as what user has inputted.

Press key ENTER to commit modification, and press key ESC to cancel editing.
Option: Auto-commit modification when cell loses focus(click other control).
When modification is committed, value in the edit control is checked:
- If value is not changed, then no checking of column type nor saving.
- If value is changed, then check the changed value as column type:
  - To invalid value, if select "Reject invalid value when edit", then restore and display original value; if not select rejection, then commit modification and display it in abnormal color.
  - To valid value, submit and save as new value, and then dsiplay saved value as type and format of the column.

Example, column type is Double, select "Reject invalid value when edit", and decimal scale is 2. When read "abc":
- The data cell is displayed as "abc" in abnormal color.
- User modifies it as "abc123":
  - While user inputs the change, the text field is always in abnormal color.
  - After user clicks Enter or other control, this data cell comes back as "abc" in abnormal color.
- User modifies it as "123.4567":
  - While user inputs the change, the text field is always in normal color.
  - After user clicks Enter or other control, this data cell is saved as "123.4567" and displayed as "123.46".

Other data cells are not affected. That is, data cells are always in originial values if they are not changed.

8 Save Data

8.1 Interface

Click button "Save"(CTRL+S or ALT+S) to write modification in data.
Changes of rows in "Table", including modify/add/delete/sort, affect rows of current page in file.
Changes in columns and attributes, including modify/add/delete/sort, affect all rows in file.
Changes of attributes and columns are saved in database.
Click button "Save As"(F5 or CTRL+B or ALT+B) to write data in other format: CSV, Excel, Texts, MyBox Clipboard, Matrix, Database Table, System Clipboard, JSON, XML, HTML, PDF.

8.2 Save Data File

Values are written into file of CSV/Texts/Excel as strings.

Option "Reject invalid value when edit".

8.3 Save Database Table

Invalid values are always rejected for database table.

Values are written into database table as nearest types:

Column Type of MyBox	Data Type of JDBC
String	VARCHAR
Double	DOUBLE
Float	FLOAT
Long	BIGINT
Integer	INT
Short	SMALLINT
Boolean	BOOLEAN
Datetime	TIMESTAMP
Date	DATE
Era	BIGINT
Longitude	DOUBLE
Latitude	DOUBLE
Enumeration	VARCHAR
Color	VARCHAR

Notice: derby does not support negative date, so Ear is saved as long.

8.4 Save Matrix

Matrices are stored as text files under internal special path.

Invalid values are always rejected for matrix.

All values in Matrix are saved as Double.

9 Calculate Data

Data are handled as original values, without concern about types and formats of columns.
Values are parsed as need. Example, if the calculation requires double values, then try to convert values as doubles.
If value conversion fails, then invalid value is handled as definition of the calculation.
Both column and calcultion can define decimal scale. Definition of calculation priors to definition of column.
Example, Column type is String, and descriptive statistic is running against it:
- Try to convert each value as double.
- If "count invalid value as zero", then invalid values are calculated into "Count", and participate in calculations of "Mean" and "Sum".
- If "skip invalid value", then invalid values are not involved in any calculation.
- If "count invalid value as empty", then invalid values cause results of all calculations as invalid(Double.NaN).

10 Sort Data

For any calculations involved in sorting, data will be translated into a temporary database table, and be sorted by database system.
Results of sorting are related to columns' types. Example, string "124" is smaller than string "18", while number "124" is bigger than number "18".

11 Manage Data

MyBox records definition of data objects in its internal table:

	csv/excel/text files	MyBox Clipboard	Matrice	Database tables
Time to create data definition	When data file is opened for the first time	When data are copied in MyBox Clipboard	When new matrix is saved in Matrices Manager	When new table is saved in Database Table Manager
Storage location of data	Data file	Files under MyBox internal path.	Files under MyBox internal path	Database tables of MyBox
When delete data definition	Data file is not affected	Internal file is deleted	Internal file is deleted	Database table is dropped

12 Data File

Data files are external data. MyBox records their definition and keeps their independence.

After read/write by MyBox, data files should be able to read/write by other ways properly.

12.1 CSV File

In CSV file:

In general, the first line(header) defines column names, and each of followed lines defines a row of data.
Values are separated by "delimiter" which can be string.
If a value contains delimiter or line break, it should be surrounded by quotes.
If delimiter is not "#", then lines started with "#" will be skipped(as comments).

When MyBox handles CSV file:

When the file is opened for the first time, the tool guesses its delimiter and charset.
If file is read abnormally, use menu "File - Format" to change options and click button "OK".
Data can be saved as different charsets and delimiters.

12.2 Excel File

In Excel file:

In general, the first line(header) defines column names, and each of followed lines defines a row of data.

When MyBox handles Excel file:

If file is read abnormally, use menu "File - Format" to change options and click button "OK".
With menu "File - Sheet", select one sheet to handle, or add/rename/edlete sheets.
Data can be saved with current sheet only or all worksheets.

Notice: Tool can only handle base data in Excel file. If file includes format, style, formula, or chart, suggest to save changes as new file to avoid data loss.

12.3 Texts File

In texts file:

In general, the first line(header) defines column names, and each of followed lines defines a row of data.
Values are separated by "delimiter" which can be string. Regular expression is supported when parse the file.
Values should not contain delimiter nor line break.
If a line starts with "#", it will be skipped.

When MyBox handles texts file:

When the file is opened for the first time, the tool guesses its delimiter and charset.
If file is read abnormally, use menu "File - Format" to change options and click button "OK".
Not support multiple lines in values.
Data can be saved as different charsets and delimiters.

13 Temporary Data File

Temporary data files are generated when:

Import example.
Create data chart.
Caculate data.

Temporary data are saved in CSV format, under internal temporary files path of MyBox by default.

If files are under MyBox temporary files path:

When MyBox exits, this path will be cleared automatically. So please save temporary data under other path if need.
They will not be backed up automatically when edit.
They will not be recorded when read or write, and will not appear in the list of "Recently accessed files".

Option: Save temporary data under "generated" path.
When this option is checked, temporary data files are not treated as "temporary files"。

More details can be referred in "User Guide - Data Tools"