Modifications

`src.brightwebapp.modifications` ¶

`_create_user_input_columns(df_original, df_user_input)` ¶

Given two dataframes with at least the columns 'UID', 'SupplyAmount', 'BurdenIntensity', returns a dataframe with additional columns 'SupplyAmount_USER', 'BurdenIntensity_USER'where only the user-provided values are kept. All other values in these new columns are replaced byNaN`.

For instance, given an 'original' DataFrame of the kind:

UID	SupplyAmount	BurdenIntensity
0	1	0.1
1	0.5	0.5
2	0.2	0.3

and a "user input" DataFrame of the kind (modified values highlighted):

UID	SupplyAmount	BurdenIntensity
0	1	0.1
1	0	0.5
2	0.2	2.1

the function returns a DataFrame of the kind:

UID	SupplyAmount	SupplyAmount_USER	BurdenIntensity	BurdenIntensity_USER
0	1	NaN	0.1	NaN
1	0.5	0	0.5	NaN
2	0.2	NaN	0.3	2.1

Parameters:

Name	Type	Description	Default
`df_original`	`DataFrame`	Original DataFrame. Must have at least columns `'UID', 'SupplyAmount', 'BurdenIntensity'`.	required
`df_user_input`	`DataFrame`	User input DataFrame. Must have at least columns `'UID', 'SupplyAmount', 'BurdenIntensity'`.	required

Returns:

Type	Description
`DataFrame`	Output DataFrame.

Raises:

Type	Description
`ValueError`	If the set of UIDs in `df_original` and `df_user_input` do not match exactly.

Source code in src/brightwebapp/modifications.py

def _create_user_input_columns(
        df_original: pd.DataFrame,
        df_user_input: pd.DataFrame,
    ) -> pd.DataFrame:
    """
    Given two dataframes with at least the columns `'UID', 'SupplyAmount', 'BurdenIntensity'`,
    returns a dataframe with additional columns 'SupplyAmount_USER', 'BurdenIntensity_USER'`
    where only the user-provided values are kept. All other values in these new columns are replaced by `NaN`.

    For instance, given an 'original' DataFrame of the kind:

    | UID | SupplyAmount | BurdenIntensity |
    |-----|--------------|-----------------|
    | 0   | 1            | 0.1             |
    | 1   | 0.5          | 0.5             |
    | 2   | 0.2          | 0.3             |

    and a "user input" DataFrame of the kind (modified values highlighted):

    | UID | SupplyAmount | BurdenIntensity |
    |-----|--------------|-----------------|
    | 0   | 1            | 0.1             |
    | 1   | **0**        | 0.5             |
    | 2   | 0.2          | **2.1**         |

    the function returns a DataFrame of the kind:

    | UID | SupplyAmount | SupplyAmount_USER | BurdenIntensity | BurdenIntensity_USER |
    |-----|--------------|-------------------|-----------------|----------------------|
    | 0   | 1            | NaN               | 0.1             | NaN                  |
    | 1   | 0.5          | **0**             | 0.5             | NaN                  |
    | 2   | 0.2          | NaN               | 0.3             | **2.1**              |

    Parameters
    ----------
    df_original : pd.DataFrame
        Original DataFrame.   
        Must have at least columns `'UID', 'SupplyAmount', 'BurdenIntensity'`.
    df_user_input : pd.DataFrame
        User input DataFrame.  
        Must have at least columns `'UID', 'SupplyAmount', 'BurdenIntensity'`.

    Returns
    -------
    pd.DataFrame
        Output DataFrame.

    Raises
    ------
    ValueError
        If the set of UIDs in `df_original` and `df_user_input` do
        not match exactly.
    """
    uids_original = set(df_original['UID'])
    uids_user_input = set(df_user_input['UID'])

    if uids_original != uids_user_input:
        raise ValueError("UIDs in original and user input dataframes do not match.")

    df_merged = pd.merge(
        df_original,
        df_user_input[['UID', 'SupplyAmount', 'BurdenIntensity']],
        on='UID',
        how='left',
        suffixes=('', '_USER')
    )
    for column_name in ['SupplyAmount', 'BurdenIntensity']:
        df_merged[f'{column_name}_USER'] = np.where(
            df_merged[f'{column_name}_USER'] != df_merged[f'{column_name}'],
            df_merged[f'{column_name}_USER'],
            np.nan
        )
    return df_merged

`_determine_edited_rows(df)` ¶

Given a dataframe with at least columns 'SupplyAmount_USER', 'BurdenIntensity_USER', returns a DataFrame with a new column that indicates whether a row has been edited by the user.

For instance, given a DataFrame of the kind:

UID	SupplyAmount_USER	BurdenIntensity_USER
0	NaN	NaN
1	0.25	NaN
2	NaN	2.1
3	NaN	NaN

the function returns a DataFrame of the kind:

UID	SupplyAmount_USER	BurdenIntensity_USER	Edited?
0	NaN	NaN	False
1	0.25	NaN	True
2	NaN	2.1	True
3	NaN	NaN	False

Source code in src/brightwebapp/modifications.py

def _determine_edited_rows(df: pd.DataFrame) -> pd.DataFrame:
    """
    Given a dataframe with at least columns `'SupplyAmount_USER', 'BurdenIntensity_USER'`,
    returns a DataFrame with a new column that indicates whether a row has been edited by the user.

    For instance, given a DataFrame of the kind:

    | UID | SupplyAmount_USER | BurdenIntensity_USER |
    |-----|-------------------|----------------------|
    | 0   | NaN               | NaN                  |
    | 1   | 0.25              | NaN                  |
    | 2   | NaN               | 2.1                  |
    | 3   | NaN               | NaN                  |

    the function returns a DataFrame of the kind:

    | UID | SupplyAmount_USER | BurdenIntensity_USER | Edited? |
    |-----|-------------------|----------------------|---------|
    | 0   | NaN               | NaN                  | False   |
    | 1   | 0.25              | NaN                  | True    |
    | 2   | NaN               | 2.1                  | True    |
    | 3   | NaN               | NaN                  | False   |
    """
    df['Edited?'] = df[['SupplyAmount_USER', 'BurdenIntensity_USER']].notnull().any(axis=1)
    return df

`_update_burden_based_on_user_data(df)` ¶

Updates the environmental burden of nodes by multiplying the burden intensity and the supply amount.

Parameters:

Name	Type	Description	Default
`df`	`DataFrame`	Input dataframe.	required

Returns:

Type	Description
`DataFrame`	Output dataframe.

Source code in src/brightwebapp/modifications.py

def _update_burden_based_on_user_data(df: pd.DataFrame) -> pd.DataFrame:
    """
    Updates the environmental burden of nodes
    by multiplying the burden intensity and the supply amount.

    Parameters
    ----------
    df : pd.DataFrame
        Input dataframe.

    Returns
    -------
    pd.DataFrame
        Output dataframe.
    """

    df['Burden(Direct)'] = df['SupplyAmount'] * df['BurdenIntensity']
    return df

`_update_burden_intensity_based_on_user_data(df)` ¶

Given a dataframe with columns 'BurdenIntensity', 'BurdenIntensity_USER', overwrites the values in BurdenIntensity whenever values are provided in BurdenIntensity_USER.

For instance, given a DataFrame of the kind:

UID	BurdenIntensity	BurdenIntensity_USER
0	0.1	NaN
1	0.5	0.25
2	0.3	NaN

the function returns a DataFrame of the kind:

UID	BurdenIntensity
0	0.1
1	0.25
2	0.3

Parameters:

Name	Type	Description	Default
`df`	`DataFrame`	Input DataFrame.	required

Returns:

Type	Description
`DataFrame`	Output dataframe.

Source code in src/brightwebapp/modifications.py

def _update_burden_intensity_based_on_user_data(df: pd.DataFrame) -> pd.DataFrame:
    """
    Given a dataframe with columns `'BurdenIntensity', 'BurdenIntensity_USER'`,
    overwrites the values in `BurdenIntensity` whenever values are provided in `BurdenIntensity_USER`.

    For instance, given a DataFrame of the kind:

    | UID | BurdenIntensity | BurdenIntensity_USER |
    |-----|-----------------|----------------------|
    | 0   | 0.1             | NaN                  |
    | 1   | 0.5             | 0.25                 |
    | 2   | 0.3             | NaN                  |

    the function returns a DataFrame of the kind:

    | UID | BurdenIntensity |
    |-----|-----------------|
    | 0   | 0.1             |
    | 1   | 0.25            |
    | 2   | 0.3             |

    Parameters
    ----------
    df : pd.DataFrame
        Input DataFrame.

    Returns
    -------
    pd.DataFrame
        Output dataframe.
    """
    df['BurdenIntensity'] = df['BurdenIntensity_USER'].combine_first(df['BurdenIntensity'])
    df = df.drop(columns=['BurdenIntensity_USER'])
    return df

`_update_production_based_on_user_data(df)` ¶

Updates the production amount of all nodes which are upstream of a node with user-supplied production amount. If an upstream node has half the use-supplied production amount, then the production amount of all downstream node is also halved.

For instance, given a DataFrame of the kind:

UID	SupplyAmount	SupplyAmount_USER	Branch
0	1	NaN	NaN
1	0.5	0.25	[0,1]
2	0.2	NaN	[0,1,2]
3	0.1	NaN	[0,3]
4	0.1	0.18	[0,1,2,4]
5	0.05	NaN	[0,1,2,4,5]
6	0.01	NaN	[0,1,2,4,5,6]

the function returns a DataFrame of the kind:

UID	SupplyAmount	Branch	Updated?
0	1	NaN	False
1	0.25	[0,1]	False
2	0.2 * (0.25/0.5)	[0,1,2]	True
3	0.1	[0,3]	False
4	0.18	[0,1,2,4]	False
5	0.05 * (0.18/0.1)	[0,1,2,4,5]	True
6	0.01 * (0.18/0.1)	[0,1,2,4,5,6]	True

Notes

As we can see, the function updates production only for those nodes upstream of a node with 'production_user':

Node 2 is upstream of node 1, which has a 'production_user' value.
Node 3 is NOT upstream of node 1. It is upstream of node 0, but node 0 does not have a 'production_user' value.

As we can see, the function always takes the "most recent" 'production_user' value upstream of a node:

Node 5 is upstream of node 4, which has a 'production_user' value.
Node 4 is upstream of node 1, which also has a 'production_user' value.

In this case, the function takes the 'production_user' value of node 4, not of node 1.

Parameters:

Name	Type	Description	Default
`df`	`DataFrame`	Input DataFrame. Must have the columns 'production', 'production_user' and 'branch'.	required

Returns:

Type	Description
`DataFrame`	Output DataFrame.

Source code in src/brightwebapp/modifications.py

def _update_production_based_on_user_data(df: pd.DataFrame) -> pd.DataFrame:
    """
    Updates the production amount of all nodes which are upstream
    of a node with user-supplied production amount.
    If an upstream node has half the use-supplied production amount,
    then the production amount of all downstream node is also halved.

    For instance, given a DataFrame of the kind:

    | UID | SupplyAmount | SupplyAmount_USER | Branch        |
    |-----|--------------|-------------------|---------------|
    | 0   | 1            | NaN               | NaN           |
    | 1   | 0.5          | 0.25              | [0,1]         |
    | 2   | 0.2          | NaN               | [0,1,2]       |
    | 3   | 0.1          | NaN               | [0,3]         |
    | 4   | 0.1          | 0.18              | [0,1,2,4]     |
    | 5   | 0.05         | NaN               | [0,1,2,4,5]   |
    | 6   | 0.01         | NaN               | [0,1,2,4,5,6] |

    the function returns a DataFrame of the kind:

    | UID | SupplyAmount      | Branch        | Updated? |
    |-----|-------------------|---------------|----------|
    | 0   | 1                 | NaN           | False    |
    | 1   | 0.25              | [0,1]         | False    |
    | 2   | 0.2 * (0.25/0.5)  | [0,1,2]       | True     |
    | 3   | 0.1               | [0,3]         | False    |
    | 4   | 0.18              | [0,1,2,4]     | False    |
    | 5   | 0.05 * (0.18/0.1) | [0,1,2,4,5]   | True     |
    | 6   | 0.01 * (0.18/0.1) | [0,1,2,4,5,6] | True     |

    Notes
    -----

    As we can see, the function updates production only
    for those nodes upstream of a node with 'production_user':

    - Node 2 is upstream of node 1, which has a 'production_user' value.
    - Node 3 is NOT upstream of node 1. It is upstream of node 0, but node 0 does not have a 'production_user' value.

    As we can see, the function always takes the "most recent"
    'production_user' value upstream of a node:

    - Node 5 is upstream of node 4, which has a 'production_user' value.
    - Node 4 is upstream of node 1, which also has a 'production_user' value.

    In this case, the function takes the 'production_user' value of node 4, not of node 1.

    Parameters
    ----------
    df : pd.DataFrame
        Input DataFrame. Must have the columns 'production', 'production_user' and 'branch'.

    Returns
    -------
    pd.DataFrame
        Output DataFrame.
    """
    df_filtered = df.dropna(subset=['SupplyAmount_USER'])
    dict_user_input = df_filtered.set_index('UID')['SupplyAmount_USER'].to_dict()
    dict_original_amount = df.set_index('UID')['SupplyAmount'].to_dict()

    df_copy = df.copy()

    def get_new_values(row) -> tuple[float, bool]:
        if not pd.isna(row['SupplyAmount_USER']):
            return (row['SupplyAmount_USER'], False)

        elif not isinstance(row['Branch'], list):
            return (row['SupplyAmount'], False)

        elif set(dict_user_input.keys()).intersection(row['Branch']):
            for branch_UID in reversed(row['Branch']):
                if branch_UID in dict_user_input:
                    original_upstream = dict_original_amount[branch_UID]
                    user_upstream = dict_user_input[branch_UID]

                    if original_upstream == 0:
                        return (0, True) 
                    else:
                        new_amount = row['SupplyAmount'] * (user_upstream / original_upstream)
                        return (new_amount, True) 

        else:
            return (row['SupplyAmount'], False)

    results = df_copy.apply(get_new_values, axis=1)
    df_copy[['SupplyAmount_EDITED', 'Updated?']] = pd.DataFrame(results.tolist(), index=df_copy.index)

    df_copy['SupplyAmount'] = df_copy['SupplyAmount_EDITED']
    df_copy.drop(columns=['SupplyAmount_USER', 'SupplyAmount_EDITED'], inplace=True)

    return df_copy

Modifications

src.brightwebapp.modifications ¶

_create_user_input_columns(df_original, df_user_input) ¶

_determine_edited_rows(df) ¶

_update_burden_based_on_user_data(df) ¶

_update_burden_intensity_based_on_user_data(df) ¶

_update_production_based_on_user_data(df) ¶

`src.brightwebapp.modifications` ¶

`_create_user_input_columns(df_original, df_user_input)` ¶

`_determine_edited_rows(df)` ¶

`_update_burden_based_on_user_data(df)` ¶

`_update_burden_intensity_based_on_user_data(df)` ¶

`_update_production_based_on_user_data(df)` ¶