tidypolars.tibble

Module Contents

Classes

Tibble

A data frame object that provides methods familiar to R tidyverse users.

Functions

desc(x)

Mark a column to order in descending

from_polars(df)

Convert from polars DataFrame to Tibble

from_pandas(df)

Convert from pandas DataFrame to Tibble

class Tibble(_data=None, **kwargs)[source]

Bases: tidypolars.reexports.pl.DataFrame

A data frame object that provides methods familiar to R tidyverse users.

property names

Get column names

Examples

>>> df.names
property ncol

Get number of columns

Examples

>>> df.ncol
property nrow

Get number of rows

Examples

>>> df.nrow
__repr__()[source]

Printing method

_repr_html_()[source]

Printing method for jupyter

Output rows and columns can be modified by setting the following ENVIRONMENT variables:

  • POLARS_FMT_MAX_COLS: set the number of columns

  • POLARS_FMT_MAX_ROWS: set the number of rows

__copy__()[source]
__str__()[source]

Printing method

__getattribute__(attr)[source]

Return getattr(self, name).

__dir__()[source]

Default dir() implementation.

arrange(*args)[source]

Arrange/sort rows

Parameters:

*args (str) – Columns to sort by

Examples

>>> df = tp.Tibble({'x': ['a', 'a', 'b'], 'y': range(3)})
>>> # Arrange in ascending order
>>> df.arrange('x', 'y')
...
>>> # Arrange some columns descending
>>> df.arrange(tp.desc('x'), 'y')
bind_cols(*args)[source]

Bind data frames by columns

Parameters:

df (Tibble) – Data frame to bind

Examples

>>> df1 = tp.Tibble({'x': ['a', 'a', 'b'], 'y': range(3)})
>>> df2 = tp.Tibble({'a': ['c', 'c', 'c'], 'b': range(4, 7)})
>>> df1.bind_cols(df2)
bind_rows(*args)[source]

Bind data frames by row

Parameters:

*args (Tibble, list) – Data frames to bind by row

Examples

>>> df1 = tp.Tibble({'x': ['a', 'a', 'b'], 'y': range(3)})
>>> df2 = tp.Tibble({'x': ['c', 'c', 'c'], 'y': range(4, 7)})
>>> df1.bind_rows(df2)
clone()[source]

Very cheap deep clone

count(*args, sort=False, name='n')[source]

Returns row counts of the dataset. If bare column names are provided, count() returns counts by group.

Parameters:
  • *args (str, Expr) – Columns to group by

  • sort (bool) – Should columns be ordered in descending order by count

  • name (str) – The name of the new column in the output. If omitted, it will default to “n”.

Examples

>>> df = tp.Tibble({'a': range(3), 'b': ['a', 'a', 'b']})
>>> df.count()
>>> df.count('b')
distinct(*args)[source]

Select distinct/unique rows

Parameters:

*args (str, Expr) – Columns to find distinct/unique rows

Examples

>>> df = tp.Tibble({'a': range(3), 'b': ['a', 'a', 'b']})
>>> df.distinct()
>>> df.distinct('b')
drop(*args)[source]

Drop unwanted columns

Parameters:

*args (str) – Columns to drop

Examples

>>> df.drop('x', 'y')
drop_null(*args)[source]

Drop rows containing missing values

Parameters:

*args (str) – Columns to drop nulls from (defaults to all)

Examples

>>> df = tp.Tibble(x = [1, None, 3], y = [None, 'b', 'c'], z = range(3)}
>>> df.drop_null()
>>> df.drop_null('x', 'y')
head(n=5, *, by=None)[source]

Alias for .slice_head()

fill(*args, direction='down', by=None)[source]

Fill in missing values with previous or next value

Parameters:
  • *args (str) – Columns to fill

  • direction (str) – Direction to fill. One of [‘down’, ‘up’, ‘downup’, ‘updown’]

  • by (str, list) – Columns to group by

Examples

>>> df = tp.Tibble({'a': [1, None, 3, 4, 5],
...                 'b': [None, 2, None, None, 5],
...                 'groups': ['a', 'a', 'a', 'b', 'b']})
>>> df.fill('a', 'b')
>>> df.fill('a', 'b', by = 'groups')
>>> df.fill('a', 'b', direction = 'downup')
filter(*args, by=None)[source]

Filter rows on one or more conditions

Parameters:
  • *args (Expr) – Conditions to filter by

  • by (str, list) – Columns to group by

Examples

>>> df = tp.Tibble({'a': range(3), 'b': ['a', 'a', 'b']})
>>> df.filter(col('a') < 2, col('b') == 'a')
>>> df.filter((col('a') < 2) & (col('b') == 'a'))
>>> df.filter(col('a') <= tp.mean(col('a')), by = 'b')
frame_equal(other, null_equal=True)[source]

Check if two Tibbles are equal

inner_join(df, left_on=None, right_on=None, on=None, suffix='_right')[source]

Perform an inner join

Parameters:
  • df (Tibble) – Lazy DataFrame to join with.

  • left_on (str, list) – Join column(s) of the left DataFrame.

  • right_on (str, list) – Join column(s) of the right DataFrame.

  • on (str, list) – Join column(s) of both DataFrames. If set, left_on and right_on should be None.

  • suffix (str) – Suffix to append to columns with a duplicate name.

Examples

>>> df1.inner_join(df2)
>>> df1.inner_join(df2, on = 'x')
>>> df1.inner_join(df2, left_on = 'left_x', right_on = 'x')
left_join(df, left_on=None, right_on=None, on=None, suffix='_right')[source]

Perform a left join

Parameters:
  • df (Tibble) – Lazy DataFrame to join with.

  • left_on (str, list) – Join column(s) of the left DataFrame.

  • right_on (str, list) – Join column(s) of the right DataFrame.

  • on (str, list) – Join column(s) of both DataFrames. If set, left_on and right_on should be None.

  • suffix (str) – Suffix to append to columns with a duplicate name.

Examples

>>> df1.left_join(df2)
>>> df1.left_join(df2, on = 'x')
>>> df1.left_join(df2, left_on = 'left_x', right_on = 'x')
mutate(*args, by=None, **kwargs)[source]

Add or modify columns

Parameters:
  • *args (Expr) – Column expressions to add or modify

  • by (str, list) – Columns to group by

  • **kwargs (Expr) – Column expressions to add or modify

Examples

>>> df = tp.Tibble({'a': range(3), 'b': range(3), c = ['a', 'a', 'b']})
>>> df.mutate(double_a = col('a') * 2,
...           a_plus_b = col('a') + col('b'))
>>> df.mutate(row_num = row_number(), by = 'c')
full_join(df, left_on=None, right_on=None, on=None, suffix: str = '_right')[source]

Perform an full join

Parameters:
  • df (Tibble) – Lazy DataFrame to join with.

  • left_on (str, list) – Join column(s) of the left DataFrame.

  • right_on (str, list) – Join column(s) of the right DataFrame.

  • on (str, list) – Join column(s) of both DataFrames. If set, left_on and right_on should be None.

  • suffix (str) – Suffix to append to columns with a duplicate name.

Examples

>>> df1.full_join(df2)
>>> df1.full_join(df2, on = 'x')
>>> df1.full_join(df2, left_on = 'left_x', right_on = 'x')
pivot_longer(cols=everything(), names_to='name', values_to='value')[source]

Pivot data from wide to long

Parameters:
  • cols (Expr) – List of the columns to pivot. Defaults to all columns.

  • names_to (str) – Name of the new “names” column.

  • values_to (str) – Name of the new “values” column

Examples

>>> df = tp.Tibble({'id': ['id1', 'id2'], 'a': [1, 2], 'b': [1, 2]})
>>> df.pivot_longer(cols = ['a', 'b'])
>>> df.pivot_longer(cols = ['a', 'b'], names_to = 'stuff', values_to = 'things')
pivot_wider(names_from='name', values_from='value', id_cols=None, values_fn='first', values_fill=None)[source]

Pivot data from long to wide

Parameters:
  • names_from (str) – Column to get the new column names from.

  • values_from (str) – Column to get the new column values from

  • id_cols (str, list) – A set of columns that uniquely identifies each observation. Defaults to all columns in the data table except for the columns specified in names_from and values_from.

  • values_fn (str) – Function for how multiple entries per group should be dealt with. Any of ‘first’, ‘count’, ‘sum’, ‘max’, ‘min’, ‘mean’, ‘median’, ‘last’

  • values_fill (str) – If values are missing/null, what value should be filled in. Can use: “backward”, “forward”, “mean”, “min”, “max”, “zero”, “one”

Examples

>>> df = tp.Tibble({'id': [1, 1], 'variable': ['a', 'b'], 'value': [1, 2]})
>>> df.pivot_wider(names_from = 'variable', values_from = 'value')
pull(var=None)[source]

Extract a column as a series

Parameters:

var (str) – Name of the column to extract. Defaults to the last column.

Examples

>>> df = tp.Tibble({'a': range(3), 'b': range(3))
>>> df.pull('a')
relocate(*args, before=None, after=None)[source]

Move a column or columns to a new position

Parameters:

*args (str, Expr) – Columns to move

Examples

>>> df = tp.Tibble({'a': range(3), 'b': range(3), 'c': ['a', 'a', 'b']})
>>> df.relocate('a', before = 'c')
>>> df.relocate('b', after = 'c')
rename(*args, **kwargs)[source]

Rename columns

Parameters:
  • *args (dict) – Dictionary mapping of new names

  • **kwargs (str) – key-value pair of new name from old name

Examples

>>> df = tp.Tibble({'x': range(3), 't': range(3), 'z': ['a', 'a', 'b']})
>>> df.rename(new_x = 'x') # dplyr interface
>>> df.rename({'x': 'new_x'}) # pandas interface
replace_null(replace=None)[source]

Replace null values

Parameters:

replace (dict) – Dictionary of column/replacement pairs

Examples

>>> df = tp.Tibble(x = [0, None], y = [None, None])
>>> df.replace_null(dict(x = 1, y = 2))
separate(sep_col, into, sep='_', remove=True)[source]

Separate a character column into multiple columns

Parameters:
  • sep_col (str) – Column to split into multiple columns

  • into (list) – List of new column names

  • sep (str) – Separator to split on. Default to ‘_’

  • remove (bool) – If True removes the input column from the output data frame

Examples

>>> df = tp.Tibble(x = ['a_a', 'b_b', 'c_c'])
>>> df.separate('x', into = ['left', 'right'])
set_names(nm=None)[source]

Change the column names of the data frame

Parameters:

nm (list) – A list of new names for the data frame

Examples

>>> df = tp.Tibble(x = range(3), y = range(3))
>>> df.set_names(['a', 'b'])
select(*args)[source]

Select or drop columns

Parameters:

*args (str, Expr) – Columns to select

Examples

>>> df = tp.Tibble({'a': range(3), 'b': range(3), 'c': ['a', 'a', 'b']})
>>> df.select('a', 'b')
>>> df.select(col('a'), col('b'))
slice(*args, by=None)[source]

Grab rows from a data frame

Parameters:
  • *args (int, list) – Rows to grab

  • by (str, list) – Columns to group by

Examples

>>> df = tp.Tibble({'a': range(3), 'b': range(3), 'c': ['a', 'a', 'b']})
>>> df.slice(0, 1)
>>> df.slice(0, by = 'c')
slice_head(n=5, *, by=None)[source]

Grab top rows from a data frame

Parameters:
  • n (int) – Number of rows to grab

  • by (str, list) – Columns to group by

Examples

>>> df = tp.Tibble({'a': range(3), 'b': range(3), 'c': ['a', 'a', 'b']})
>>> df.slice_head(2)
>>> df.slice_head(1, by = 'c')
slice_tail(n=5, *, by=None)[source]

Grab bottom rows from a data frame

Parameters:
  • n (int) – Number of rows to grab

  • by (str, list) – Columns to group by

Examples

>>> df = tp.Tibble({'a': range(3), 'b': range(3), 'c': ['a', 'a', 'b']})
>>> df.slice_tail(2)
>>> df.slice_tail(1, by = 'c')
summarise(*args, by=None, **kwargs)[source]

Alias for .summarize()

summarize(*args, by=None, **kwargs)[source]

Aggregate data with summary statistics

Parameters:
  • *args (Expr) – Column expressions to add or modify

  • by (str, list) – Columns to group by

  • **kwargs (Expr) – Column expressions to add or modify

Examples

>>> df = tp.Tibble({'a': range(3), 'b': range(3), 'c': ['a', 'a', 'b']})
>>> df.summarize(avg_a = tp.mean(col('a')))
>>> df.summarize(avg_a = tp.mean(col('a')),
...              by = 'c')
>>> df.summarize(avg_a = tp.mean(col('a')),
...              max_b = tp.max(col('b')))
tail(n=5, *, by=None)[source]

Alias for .slice_tail()

to_dict(as_series=True)[source]

Aggregate data with summary statistics

Parameters:

as_series (bool) – If True - returns the dict values as Series If False - returns the dict values as lists

Examples

>>> df.to_dict()
>>> df.to_dict(as_series = False)
to_pandas()[source]

Convert to a pandas DataFrame

Examples

>>> df.to_pandas()
to_polars()[source]

Convert to a polars DataFrame

Examples

>>> df.to_polars()
unite(col='_united', unite_cols=[], sep='_', remove=True)[source]

Unite multiple columns by pasting strings together

Parameters:
  • col (str) – Name of the new column

  • unite_cols (list) – List of columns to unite

  • sep (str) – Separator to use between values

  • remove (bool) – If True removes input columns from the data frame

Examples

>>> df = tp.Tibble(a = ["a", "a", "a"], b = ["b", "b", "b"], c = range(3))
>>> df.unite("united_col", unite_cols = ["a", "b"])
write_csv(file=None, has_headers=True, sep=',')[source]

Write a data frame to a csv

write_parquet(file=str, compression='snappy', use_pyarrow=False, **kwargs)[source]

Write a data frame to a parquet

desc(x)[source]

Mark a column to order in descending

from_polars(df)[source]

Convert from polars DataFrame to Tibble

Parameters:

df (DataFrame) – pl.DataFrame to convert to a Tibble

Examples

>>> tp.from_polars(df)
from_pandas(df)[source]

Convert from pandas DataFrame to Tibble

Parameters:

df (DataFrame) – pd.DataFrame to convert to a Tibble

Examples

>>> tp.from_pandas(df)