tidypolars.tibble
¶
Module Contents¶
Classes¶
A data frame object that provides methods familiar to R tidyverse users. |
Functions¶
|
Mark a column to order in descending |
|
Convert from polars DataFrame to Tibble |
|
Convert from pandas DataFrame to Tibble |
- class Tibble(_data=None, **kwargs)[source]¶
Bases:
tidypolars.reexports.pl.DataFrame
A data frame object that provides methods familiar to R tidyverse users.
- property names¶
Get column names
Examples
>>> df.names
- property ncol¶
Get number of columns
Examples
>>> df.ncol
- property nrow¶
Get number of rows
Examples
>>> df.nrow
- _repr_html_()[source]¶
Printing method for jupyter
Output rows and columns can be modified by setting the following ENVIRONMENT variables:
POLARS_FMT_MAX_COLS: set the number of columns
POLARS_FMT_MAX_ROWS: set the number of rows
- arrange(*args)[source]¶
Arrange/sort rows
- Parameters:
*args (str) – Columns to sort by
Examples
>>> df = tp.Tibble({'x': ['a', 'a', 'b'], 'y': range(3)}) >>> # Arrange in ascending order >>> df.arrange('x', 'y') ... >>> # Arrange some columns descending >>> df.arrange(tp.desc('x'), 'y')
- bind_cols(*args)[source]¶
Bind data frames by columns
- Parameters:
df (Tibble) – Data frame to bind
Examples
>>> df1 = tp.Tibble({'x': ['a', 'a', 'b'], 'y': range(3)}) >>> df2 = tp.Tibble({'a': ['c', 'c', 'c'], 'b': range(4, 7)}) >>> df1.bind_cols(df2)
- bind_rows(*args)[source]¶
Bind data frames by row
- Parameters:
*args (Tibble, list) – Data frames to bind by row
Examples
>>> df1 = tp.Tibble({'x': ['a', 'a', 'b'], 'y': range(3)}) >>> df2 = tp.Tibble({'x': ['c', 'c', 'c'], 'y': range(4, 7)}) >>> df1.bind_rows(df2)
- count(*args, sort=False, name='n')[source]¶
Returns row counts of the dataset. If bare column names are provided, count() returns counts by group.
- Parameters:
*args (str, Expr) – Columns to group by
sort (bool) – Should columns be ordered in descending order by count
name (str) – The name of the new column in the output. If omitted, it will default to “n”.
Examples
>>> df = tp.Tibble({'a': range(3), 'b': ['a', 'a', 'b']}) >>> df.count() >>> df.count('b')
- distinct(*args)[source]¶
Select distinct/unique rows
- Parameters:
*args (str, Expr) – Columns to find distinct/unique rows
Examples
>>> df = tp.Tibble({'a': range(3), 'b': ['a', 'a', 'b']}) >>> df.distinct() >>> df.distinct('b')
- drop(*args)[source]¶
Drop unwanted columns
- Parameters:
*args (str) – Columns to drop
Examples
>>> df.drop('x', 'y')
- drop_null(*args)[source]¶
Drop rows containing missing values
- Parameters:
*args (str) – Columns to drop nulls from (defaults to all)
Examples
>>> df = tp.Tibble(x = [1, None, 3], y = [None, 'b', 'c'], z = range(3)} >>> df.drop_null() >>> df.drop_null('x', 'y')
- fill(*args, direction='down', by=None)[source]¶
Fill in missing values with previous or next value
- Parameters:
*args (str) – Columns to fill
direction (str) – Direction to fill. One of [‘down’, ‘up’, ‘downup’, ‘updown’]
by (str, list) – Columns to group by
Examples
>>> df = tp.Tibble({'a': [1, None, 3, 4, 5], ... 'b': [None, 2, None, None, 5], ... 'groups': ['a', 'a', 'a', 'b', 'b']}) >>> df.fill('a', 'b') >>> df.fill('a', 'b', by = 'groups') >>> df.fill('a', 'b', direction = 'downup')
- filter(*args, by=None)[source]¶
Filter rows on one or more conditions
- Parameters:
*args (Expr) – Conditions to filter by
by (str, list) – Columns to group by
Examples
>>> df = tp.Tibble({'a': range(3), 'b': ['a', 'a', 'b']}) >>> df.filter(col('a') < 2, col('b') == 'a') >>> df.filter((col('a') < 2) & (col('b') == 'a')) >>> df.filter(col('a') <= tp.mean(col('a')), by = 'b')
- inner_join(df, left_on=None, right_on=None, on=None, suffix='_right')[source]¶
Perform an inner join
- Parameters:
df (Tibble) – Lazy DataFrame to join with.
left_on (str, list) – Join column(s) of the left DataFrame.
right_on (str, list) – Join column(s) of the right DataFrame.
on (str, list) – Join column(s) of both DataFrames. If set, left_on and right_on should be None.
suffix (str) – Suffix to append to columns with a duplicate name.
Examples
>>> df1.inner_join(df2) >>> df1.inner_join(df2, on = 'x') >>> df1.inner_join(df2, left_on = 'left_x', right_on = 'x')
- left_join(df, left_on=None, right_on=None, on=None, suffix='_right')[source]¶
Perform a left join
- Parameters:
df (Tibble) – Lazy DataFrame to join with.
left_on (str, list) – Join column(s) of the left DataFrame.
right_on (str, list) – Join column(s) of the right DataFrame.
on (str, list) – Join column(s) of both DataFrames. If set, left_on and right_on should be None.
suffix (str) – Suffix to append to columns with a duplicate name.
Examples
>>> df1.left_join(df2) >>> df1.left_join(df2, on = 'x') >>> df1.left_join(df2, left_on = 'left_x', right_on = 'x')
- mutate(*args, by=None, **kwargs)[source]¶
Add or modify columns
- Parameters:
*args (Expr) – Column expressions to add or modify
by (str, list) – Columns to group by
**kwargs (Expr) – Column expressions to add or modify
Examples
>>> df = tp.Tibble({'a': range(3), 'b': range(3), c = ['a', 'a', 'b']}) >>> df.mutate(double_a = col('a') * 2, ... a_plus_b = col('a') + col('b')) >>> df.mutate(row_num = row_number(), by = 'c')
- full_join(df, left_on=None, right_on=None, on=None, suffix: str = '_right')[source]¶
Perform an full join
- Parameters:
df (Tibble) – Lazy DataFrame to join with.
left_on (str, list) – Join column(s) of the left DataFrame.
right_on (str, list) – Join column(s) of the right DataFrame.
on (str, list) – Join column(s) of both DataFrames. If set, left_on and right_on should be None.
suffix (str) – Suffix to append to columns with a duplicate name.
Examples
>>> df1.full_join(df2) >>> df1.full_join(df2, on = 'x') >>> df1.full_join(df2, left_on = 'left_x', right_on = 'x')
- pivot_longer(cols=everything(), names_to='name', values_to='value')[source]¶
Pivot data from wide to long
- Parameters:
cols (Expr) – List of the columns to pivot. Defaults to all columns.
names_to (str) – Name of the new “names” column.
values_to (str) – Name of the new “values” column
Examples
>>> df = tp.Tibble({'id': ['id1', 'id2'], 'a': [1, 2], 'b': [1, 2]}) >>> df.pivot_longer(cols = ['a', 'b']) >>> df.pivot_longer(cols = ['a', 'b'], names_to = 'stuff', values_to = 'things')
- pivot_wider(names_from='name', values_from='value', id_cols=None, values_fn='first', values_fill=None)[source]¶
Pivot data from long to wide
- Parameters:
names_from (str) – Column to get the new column names from.
values_from (str) – Column to get the new column values from
id_cols (str, list) – A set of columns that uniquely identifies each observation. Defaults to all columns in the data table except for the columns specified in names_from and values_from.
values_fn (str) – Function for how multiple entries per group should be dealt with. Any of ‘first’, ‘count’, ‘sum’, ‘max’, ‘min’, ‘mean’, ‘median’, ‘last’
values_fill (str) – If values are missing/null, what value should be filled in. Can use: “backward”, “forward”, “mean”, “min”, “max”, “zero”, “one”
Examples
>>> df = tp.Tibble({'id': [1, 1], 'variable': ['a', 'b'], 'value': [1, 2]}) >>> df.pivot_wider(names_from = 'variable', values_from = 'value')
- pull(var=None)[source]¶
Extract a column as a series
- Parameters:
var (str) – Name of the column to extract. Defaults to the last column.
Examples
>>> df = tp.Tibble({'a': range(3), 'b': range(3)) >>> df.pull('a')
- relocate(*args, before=None, after=None)[source]¶
Move a column or columns to a new position
- Parameters:
*args (str, Expr) – Columns to move
Examples
>>> df = tp.Tibble({'a': range(3), 'b': range(3), 'c': ['a', 'a', 'b']}) >>> df.relocate('a', before = 'c') >>> df.relocate('b', after = 'c')
- rename(*args, **kwargs)[source]¶
Rename columns
- Parameters:
*args (dict) – Dictionary mapping of new names
**kwargs (str) – key-value pair of new name from old name
Examples
>>> df = tp.Tibble({'x': range(3), 't': range(3), 'z': ['a', 'a', 'b']}) >>> df.rename(new_x = 'x') # dplyr interface >>> df.rename({'x': 'new_x'}) # pandas interface
- replace_null(replace=None)[source]¶
Replace null values
- Parameters:
replace (dict) – Dictionary of column/replacement pairs
Examples
>>> df = tp.Tibble(x = [0, None], y = [None, None]) >>> df.replace_null(dict(x = 1, y = 2))
- separate(sep_col, into, sep='_', remove=True)[source]¶
Separate a character column into multiple columns
- Parameters:
sep_col (str) – Column to split into multiple columns
into (list) – List of new column names
sep (str) – Separator to split on. Default to ‘_’
remove (bool) – If True removes the input column from the output data frame
Examples
>>> df = tp.Tibble(x = ['a_a', 'b_b', 'c_c']) >>> df.separate('x', into = ['left', 'right'])
- set_names(nm=None)[source]¶
Change the column names of the data frame
- Parameters:
nm (list) – A list of new names for the data frame
Examples
>>> df = tp.Tibble(x = range(3), y = range(3)) >>> df.set_names(['a', 'b'])
- select(*args)[source]¶
Select or drop columns
- Parameters:
*args (str, Expr) – Columns to select
Examples
>>> df = tp.Tibble({'a': range(3), 'b': range(3), 'c': ['a', 'a', 'b']}) >>> df.select('a', 'b') >>> df.select(col('a'), col('b'))
- slice(*args, by=None)[source]¶
Grab rows from a data frame
- Parameters:
*args (int, list) – Rows to grab
by (str, list) – Columns to group by
Examples
>>> df = tp.Tibble({'a': range(3), 'b': range(3), 'c': ['a', 'a', 'b']}) >>> df.slice(0, 1) >>> df.slice(0, by = 'c')
- slice_head(n=5, *, by=None)[source]¶
Grab top rows from a data frame
- Parameters:
n (int) – Number of rows to grab
by (str, list) – Columns to group by
Examples
>>> df = tp.Tibble({'a': range(3), 'b': range(3), 'c': ['a', 'a', 'b']}) >>> df.slice_head(2) >>> df.slice_head(1, by = 'c')
- slice_tail(n=5, *, by=None)[source]¶
Grab bottom rows from a data frame
- Parameters:
n (int) – Number of rows to grab
by (str, list) – Columns to group by
Examples
>>> df = tp.Tibble({'a': range(3), 'b': range(3), 'c': ['a', 'a', 'b']}) >>> df.slice_tail(2) >>> df.slice_tail(1, by = 'c')
- summarize(*args, by=None, **kwargs)[source]¶
Aggregate data with summary statistics
- Parameters:
*args (Expr) – Column expressions to add or modify
by (str, list) – Columns to group by
**kwargs (Expr) – Column expressions to add or modify
Examples
>>> df = tp.Tibble({'a': range(3), 'b': range(3), 'c': ['a', 'a', 'b']}) >>> df.summarize(avg_a = tp.mean(col('a'))) >>> df.summarize(avg_a = tp.mean(col('a')), ... by = 'c') >>> df.summarize(avg_a = tp.mean(col('a')), ... max_b = tp.max(col('b')))
- to_dict(as_series=True)[source]¶
Aggregate data with summary statistics
- Parameters:
as_series (bool) – If True - returns the dict values as Series If False - returns the dict values as lists
Examples
>>> df.to_dict() >>> df.to_dict(as_series = False)
- unite(col='_united', unite_cols=[], sep='_', remove=True)[source]¶
Unite multiple columns by pasting strings together
- Parameters:
col (str) – Name of the new column
unite_cols (list) – List of columns to unite
sep (str) – Separator to use between values
remove (bool) – If True removes input columns from the data frame
Examples
>>> df = tp.Tibble(a = ["a", "a", "a"], b = ["b", "b", "b"], c = range(3)) >>> df.unite("united_col", unite_cols = ["a", "b"])