tidypolars.tibble_df¶
Classes¶
A data frame object that provides methods familiar to R tidyverse users. |
Functions¶
|
Mark a column to order in descending |
|
Convert an object to a tibble |
|
Is an object to a tibble |
|
Convert from polars DataFrame to tibble |
|
Convert from pandas DataFrame to tibble |
Module Contents¶
- class tibble(_data=None, **kwargs)[source]¶
Bases:
tidypolars.reexports.pl.DataFrameA data frame object that provides methods familiar to R tidyverse users.
- _repr_html_()[source]¶
Printing method for jupyter
Output rows and columns can be modified by setting the following ENVIRONMENT variables:
POLARS_FMT_MAX_COLS: set the number of columns
POLARS_FMT_MAX_ROWS: set the number of rows
- __getitem__(col)[source]¶
Get part of the DataFrame as a new DataFrame, Series, or scalar.
- Parameters:
key –
Rows / columns to select. This is easiest to explain via example. Suppose we have a DataFrame with columns ‘a’, ‘d’, ‘c’, ‘d’. Here is what various types of key would do:
df[0, ‘a’] extracts the first element of column ‘a’ and returns a scalar.
df[0] extracts the first row and returns a Dataframe.
df[‘a’] extracts column ‘a’ and returns a Series.
df[0:2] extracts the first two rows and returns a Dataframe.
df[0:2, ‘a’] extracts the first two rows from column ‘a’ and returns a Series.
df[0:2, 0] extracts the first two rows from the first column and returns a Series.
df[[0, 1], [0, 1, 2]] extracts the first two rows and the first three columns and returns a Dataframe.
df[0: 2, [‘a’, ‘c’]] extracts the first two rows from columns ‘a’ and ‘c’ and returns a Dataframe.
df[:, 0: 2] extracts all rows from the first two columns and returns a Dataframe.
df[:, ‘a’: ‘c’] extracts all rows and all columns positioned between ‘a’ and ‘c’ inclusive and returns a Dataframe. In our example, that would extract columns ‘a’, ‘d’, and ‘c’.
- Return type:
DataFrame, Series, or scalar, depending on key.
Examples
>>> df = pl.DataFrame( ... {"a": [1, 2, 3], "d": [4, 5, 6], "c": [1, 3, 2], "b": [7, 8, 9]} ... ) >>> df[0] shape: (1, 4) ┌─────┬─────┬─────┬─────┐ │ a ┆ d ┆ c ┆ b │ │ --- ┆ --- ┆ --- ┆ --- │ │ i64 ┆ i64 ┆ i64 ┆ i64 │ ╞═════╪═════╪═════╪═════╡ │ 1 ┆ 4 ┆ 1 ┆ 7 │ └─────┴─────┴─────┴─────┘ >>> df[0, "a"] 1 >>> df["a"] shape: (3,) Series: 'a' [i64] [ 1 2 3 ] >>> df[0:2] shape: (2, 4) ┌─────┬─────┬─────┬─────┐ │ a ┆ d ┆ c ┆ b │ │ --- ┆ --- ┆ --- ┆ --- │ │ i64 ┆ i64 ┆ i64 ┆ i64 │ ╞═════╪═════╪═════╪═════╡ │ 1 ┆ 4 ┆ 1 ┆ 7 │ │ 2 ┆ 5 ┆ 3 ┆ 8 │ └─────┴─────┴─────┴─────┘ >>> df[0:2, "a"] shape: (2,) Series: 'a' [i64] [ 1 2 ] >>> df[0:2, 0] shape: (2,) Series: 'a' [i64] [ 1 2 ] >>> df[[0, 1], [0, 1, 2]] shape: (2, 3) ┌─────┬─────┬─────┐ │ a ┆ d ┆ c │ │ --- ┆ --- ┆ --- │ │ i64 ┆ i64 ┆ i64 │ ╞═════╪═════╪═════╡ │ 1 ┆ 4 ┆ 1 │ │ 2 ┆ 5 ┆ 3 │ └─────┴─────┴─────┘ >>> df[0:2, ["a", "c"]] shape: (2, 2) ┌─────┬─────┐ │ a ┆ c │ │ --- ┆ --- │ │ i64 ┆ i64 │ ╞═════╪═════╡ │ 1 ┆ 1 │ │ 2 ┆ 3 │ └─────┴─────┘ >>> df[:, 0:2] shape: (3, 2) ┌─────┬─────┐ │ a ┆ d │ │ --- ┆ --- │ │ i64 ┆ i64 │ ╞═════╪═════╡ │ 1 ┆ 4 │ │ 2 ┆ 5 │ │ 3 ┆ 6 │ └─────┴─────┘ >>> df[:, "a":"c"] shape: (3, 3) ┌─────┬─────┬─────┐ │ a ┆ d ┆ c │ │ --- ┆ --- ┆ --- │ │ i64 ┆ i64 ┆ i64 │ ╞═════╪═════╪═════╡ │ 1 ┆ 4 ┆ 1 │ │ 2 ┆ 5 ┆ 3 │ │ 3 ┆ 6 ┆ 2 │ └─────┴─────┴─────┘
- arrange(*args)[source]¶
Arrange/sort rows
- Parameters:
*args (str) – Columns to sort by
Examples
>>> df = tp.tibble({'x': ['a', 'a', 'b'], 'y': range(3)}) >>> # Arrange in ascending order >>> df.arrange('x', 'y') ... >>> # Arrange some columns descending >>> df.arrange(tp.desc('x'), 'y')
- as_dict(*, as_series=True)[source]¶
Aggregate data with summary statistics
- Parameters:
as_series (bool) – If True - returns the dict values as Series If False - returns the dict values as lists
Examples
>>> df.to_dict() >>> df.to_dict(as_series = False)
- bind_cols(*args)[source]¶
Bind data frames by columns
- Parameters:
df (tibble) – Data frame to bind
Examples
>>> df1 = tp.tibble({'x': ['a', 'a', 'b'], 'y': range(3)}) >>> df2 = tp.tibble({'a': ['c', 'c', 'c'], 'b': range(4, 7)}) >>> df1.bind_cols(df2)
- bind_rows(*args)[source]¶
Bind data frames by row
- Parameters:
*args (tibble, list) – Data frames to bind by row
Examples
>>> df1 = tp.tibble({'x': ['a', 'a', 'b'], 'y': range(3)}) >>> df2 = tp.tibble({'x': ['c', 'c', 'c'], 'y': range(4, 7)}) >>> df1.bind_rows(df2)
- count(*args, sort=False, name='n')[source]¶
Returns row counts of the dataset. If bare column names are provided, count() returns counts by group.
- Parameters:
*args (str, Expr) – Columns to group by
sort (bool) – Should columns be ordered in descending order by count
name (str) – The name of the new column in the output. If omitted, it will default to “n”.
Examples
>>> df = tp.tibble({'a': range(3), 'b': ['a', 'a', 'b']}) >>> df.count() >>> df.count('b')
- distinct(*args)[source]¶
Select distinct/unique rows
- Parameters:
*args (str, Expr) – Columns to find distinct/unique rows
Examples
>>> df = tp.tibble({'a': range(3), 'b': ['a', 'a', 'b']}) >>> df.distinct() >>> df.distinct('b')
- drop(*args)[source]¶
Drop unwanted columns
- Parameters:
*args (str) – Columns to drop
Examples
>>> df.drop('x', 'y')
- drop_null(*args)[source]¶
Drop rows containing missing values
- Parameters:
*args (str) – Columns to drop nulls from (defaults to all)
Examples
>>> df = tp.tibble(x = [1, None, 3], y = [None, 'b', 'c'], z = range(3)} >>> df.drop_null() >>> df.drop_null('x', 'y')
- glimpse()[source]¶
Return a dense preview of the DataFrame.
The formatting shows one line per column so that wide dataframes display cleanly. Each line shows the column name, the data type, and the first few values.
- fill(*args, direction='down', _by=None)[source]¶
Fill in missing values with previous or next value
- Parameters:
*args (str) – Columns to fill
direction (str) – Direction to fill. One of [‘down’, ‘up’, ‘downup’, ‘updown’]
by (str, list) – Columns to group by
Examples
>>> df = tp.tibble({'a': [1, None, 3, 4, 5], ... 'b': [None, 2, None, None, 5], ... 'groups': ['a', 'a', 'a', 'b', 'b']}) >>> df.fill('a', 'b') >>> df.fill('a', 'b', by = 'groups') >>> df.fill('a', 'b', direction = 'downup')
- filter(*args, _by=None)[source]¶
Filter rows on one or more conditions
- Parameters:
*args (Expr) – Conditions to filter by
by (str, list) – Columns to group by
Examples
>>> df = tp.tibble({'a': range(3), 'b': ['a', 'a', 'b']}) >>> df.filter(col('a') < 2, col('b') == 'a') >>> df.filter((col('a') < 2) & (col('b') == 'a')) >>> df.filter(col('a') <= tp.mean(col('a')), by = 'b')
- full_join(df, left_on=None, right_on=None, on=None, suffix: str = '_right')[source]¶
Perform an full join
- Parameters:
df (tibble) – Lazy DataFrame to join with.
left_on (str, list) – Join column(s) of the left DataFrame.
right_on (str, list) – Join column(s) of the right DataFrame.
on (str, list) – Join column(s) of both DataFrames. If set, left_on and right_on should be None.
suffix (str) – Suffix to append to columns with a duplicate name.
Examples
>>> df1.full_join(df2) >>> df1.full_join(df2, on = 'x') >>> df1.full_join(df2, left_on = 'left_x', right_on = 'x')
- inner_join(df, left_on=None, right_on=None, on=None, suffix='_right')[source]¶
Perform an inner join
- Parameters:
df (tibble) – Lazy DataFrame to join with.
left_on (str, list) – Join column(s) of the left DataFrame.
right_on (str, list) – Join column(s) of the right DataFrame.
on (str, list) – Join column(s) of both DataFrames. If set, left_on and right_on should be None.
suffix (str) – Suffix to append to columns with a duplicate name.
Examples
>>> df1.inner_join(df2) >>> df1.inner_join(df2, on = 'x') >>> df1.inner_join(df2, left_on = 'left_x', right_on = 'x')
- left_join(df, left_on=None, right_on=None, on=None, suffix='_right')[source]¶
Perform a left join
- Parameters:
df (tibble) – Lazy DataFrame to join with.
left_on (str, list) – Join column(s) of the left DataFrame.
right_on (str, list) – Join column(s) of the right DataFrame.
on (str, list) – Join column(s) of both DataFrames. If set, left_on and right_on should be None.
suffix (str) – Suffix to append to columns with a duplicate name.
Examples
>>> df1.left_join(df2) >>> df1.left_join(df2, on = 'x') >>> df1.left_join(df2, left_on = 'left_x', right_on = 'x')
- mutate(*args, _by=None, **kwargs)[source]¶
Add or modify columns
- Parameters:
*args (Expr) – Column expressions to add or modify
by (str, list) – Columns to group by
**kwargs (Expr) – Column expressions to add or modify
Examples
>>> df = tp.tibble({'a': range(3), 'b': range(3), c = ['a', 'a', 'b']}) >>> df.mutate(double_a = col('a') * 2, ... a_plus_b = col('a') + col('b')) >>> df.mutate(row_num = row_number(), by = 'c')
- pivot_longer(cols=everything(), names_to='name', values_to='value')[source]¶
Pivot data from wide to long
- Parameters:
cols (Expr) – List of the columns to pivot. Defaults to all columns.
names_to (str) – Name of the new “names” column.
values_to (str) – Name of the new “values” column
Examples
>>> df = tp.tibble({'id': ['id1', 'id2'], 'a': [1, 2], 'b': [1, 2]}) >>> df.pivot_longer(cols = ['a', 'b']) >>> df.pivot_longer(cols = ['a', 'b'], names_to = 'stuff', values_to = 'things')
- pivot_wider(names_from='name', values_from='value', id_cols=None, values_fn='first', values_fill=None)[source]¶
Pivot data from long to wide
- Parameters:
names_from (str) – Column to get the new column names from.
values_from (str) – Column to get the new column values from
id_cols (str, list) – A set of columns that uniquely identifies each observation. Defaults to all columns in the data table except for the columns specified in names_from and values_from.
values_fn (str) – Function for how multiple entries per group should be dealt with. Any of ‘first’, ‘count’, ‘sum’, ‘max’, ‘min’, ‘mean’, ‘median’, ‘last’
values_fill (str) – If values are missing/null, what value should be filled in. Can use: “backward”, “forward”, “mean”, “min”, “max”, “zero”, “one”
Examples
>>> df = tp.tibble({'id': [1, 1], 'variable': ['a', 'b'], 'value': [1, 2]}) >>> df.pivot_wider(names_from = 'variable', values_from = 'value')
- pull(var=None)[source]¶
Extract a column as a series
- Parameters:
var (str) – Name of the column to extract. Defaults to the last column.
Examples
>>> df = tp.tibble({'a': range(3), 'b': range(3)) >>> df.pull('a')
- relocate(*args, _before=None, _after=None)[source]¶
Move a column or columns to a new position
- Parameters:
*args (str, Expr) – Columns to move
Examples
>>> df = tp.tibble({'a': range(3), 'b': range(3), 'c': ['a', 'a', 'b']}) >>> df.relocate('a', before = 'c') >>> df.relocate('b', after = 'c')
- rename(_mapping=None, **kwargs)[source]¶
Rename columns
- Parameters:
_mapping (dict) – Dictionary mapping of new names
**kwargs (str) – key-value pair of new name from old name
Examples
>>> df = tp.tibble({'x': range(3), 't': range(3), 'z': ['a', 'a', 'b']}) >>> df.rename(new_x = 'x') # dplyr interface >>> df.rename({'x': 'new_x'}) # pandas interface
- replace_null(replace=None)[source]¶
Replace null values
- Parameters:
replace (dict) – Dictionary of column/replacement pairs
Examples
>>> df = tp.tibble(x = [0, None], y = [None, None]) >>> df.replace_null(dict(x = 1, y = 2))
- separate(sep_col, into, sep='_', remove=True)[source]¶
Separate a character column into multiple columns
- Parameters:
sep_col (str) – Column to split into multiple columns
into (list) – List of new column names
sep (str) – Separator to split on. Default to ‘_’
remove (bool) – If True removes the input column from the output data frame
Examples
>>> df = tp.tibble(x = ['a_a', 'b_b', 'c_c']) >>> df.separate('x', into = ['left', 'right'])
- set_names(nm=None)[source]¶
Change the column names of the data frame
- Parameters:
nm (list) – A list of new names for the data frame
Examples
>>> df = tp.tibble(x = range(3), y = range(3)) >>> df.set_names(['a', 'b'])
- select(*args)[source]¶
Select or drop columns
- Parameters:
*args (str, Expr) – Columns to select
Examples
>>> df = tp.tibble({'a': range(3), 'b': range(3), 'c': ['a', 'a', 'b']}) >>> df.select('a', 'b') >>> df.select(col('a'), col('b'))
- slice(*args, _by=None)[source]¶
Grab rows from a data frame
- Parameters:
*args (int, list) – Rows to grab
by (str, list) – Columns to group by
Examples
>>> df = tp.tibble({'a': range(3), 'b': range(3), 'c': ['a', 'a', 'b']}) >>> df.slice(0, 1) >>> df.slice(0, by = 'c')
- slice_head(n=5, *, _by=None)[source]¶
Grab top rows from a data frame
- Parameters:
n (int) – Number of rows to grab
by (str, list) – Columns to group by
Examples
>>> df = tp.tibble({'a': range(3), 'b': range(3), 'c': ['a', 'a', 'b']}) >>> df.slice_head(2) >>> df.slice_head(1, by = 'c')
- slice_tail(n=5, *, _by=None)[source]¶
Grab bottom rows from a data frame
- Parameters:
n (int) – Number of rows to grab
by (str, list) – Columns to group by
Examples
>>> df = tp.tibble({'a': range(3), 'b': range(3), 'c': ['a', 'a', 'b']}) >>> df.slice_tail(2) >>> df.slice_tail(1, by = 'c')
- summarize(*args, _by=None, **kwargs)[source]¶
Aggregate data with summary statistics
- Parameters:
*args (Expr) – Column expressions to add or modify
by (str, list) – Columns to group by
**kwargs (Expr) – Column expressions to add or modify
Examples
>>> df = tp.tibble({'a': range(3), 'b': range(3), 'c': ['a', 'a', 'b']}) >>> df.summarize(avg_a = tp.mean(col('a'))) >>> df.summarize(avg_a = tp.mean(col('a')), ... by = 'c') >>> df.summarize(avg_a = tp.mean(col('a')), ... max_b = tp.max(col('b')))
- unite(col='_united', unite_cols=[], sep='_', remove=True)[source]¶
Unite multiple columns by pasting strings together
- Parameters:
col (str) – Name of the new column
unite_cols (list) – List of columns to unite
sep (str) – Separator to use between values
remove (bool) – If True removes input columns from the data frame
Examples
>>> df = tp.tibble(a = ["a", "a", "a"], b = ["b", "b", "b"], c = range(3)) >>> df.unite("united_col", unite_cols = ["a", "b"])
- write_parquet(file=str, compression='snappy', use_pyarrow=False, **kwargs)[source]¶
Write a data frame to a parquet
- property names¶
Get column names
Examples
>>> df.names
- property ncol¶
Get number of columns
Examples
>>> df.ncol
- property nrow¶
Get number of rows
Examples
>>> df.nrow
- property plot¶
Access to polars plotting
Examples
>>> df.plot
- as_tibble(x)[source]¶
Convert an object to a tibble
- Parameters:
x ([pl.DataFrame, pd.DataFrame, dict]) – Object to convert to a tibble
Examples
>>> tp.as_tibble(polars_df)