tidypolars.tibble_df¶

Classes¶

tibble

A data frame object that provides methods familiar to R tidyverse users.

Functions¶

`desc`(x)	Mark a column to order in descending
`as_tibble`(x)	Convert an object to a tibble
`is_tibble`(x)	Is an object to a tibble
`from_polars`(df)	Convert from polars DataFrame to tibble
`from_pandas`(df)	Convert from pandas DataFrame to tibble

Module Contents¶

class tibble(_data=None, **kwargs)[source]¶

Bases: tidypolars.reexports.pl.DataFrame

A data frame object that provides methods familiar to R tidyverse users.

__dir__()[source]¶

__repr__()[source]¶: Printing method

_repr_html_()[source]¶

Printing method for jupyter

Output rows and columns can be modified by setting the following ENVIRONMENT variables:

POLARS_FMT_MAX_COLS: set the number of columns
POLARS_FMT_MAX_ROWS: set the number of rows

__copy__()[source]¶

__str__()[source]¶: Printing method

__getattribute__(attr)[source]¶

__getitem__(col)[source]¶

Get part of the DataFrame as a new DataFrame, Series, or scalar.

Parameters:

key –

Rows / columns to select. This is easiest to explain via example. Suppose we have a DataFrame with columns ‘a’, ‘d’, ‘c’, ‘d’. Here is what various types of key would do:

df[0, ‘a’] extracts the first element of column ‘a’ and returns a scalar.
df[0] extracts the first row and returns a Dataframe.
df[‘a’] extracts column ‘a’ and returns a Series.
df[0:2] extracts the first two rows and returns a Dataframe.
df[0:2, ‘a’] extracts the first two rows from column ‘a’ and returns a Series.
df[0:2, 0] extracts the first two rows from the first column and returns a Series.
df[[0, 1], [0, 1, 2]] extracts the first two rows and the first three columns and returns a Dataframe.
df[0: 2, [‘a’, ‘c’]] extracts the first two rows from columns ‘a’ and ‘c’ and returns a Dataframe.
df[:, 0: 2] extracts all rows from the first two columns and returns a Dataframe.
df[:, ‘a’: ‘c’] extracts all rows and all columns positioned between ‘a’ and ‘c’ inclusive and returns a Dataframe. In our example, that would extract columns ‘a’, ‘d’, and ‘c’.

Return type:

DataFrame, Series, or scalar, depending on key.

Examples

>>> df = pl.DataFrame(
...     {"a": [1, 2, 3], "d": [4, 5, 6], "c": [1, 3, 2], "b": [7, 8, 9]}
... )
>>> df[0]
shape: (1, 4)
┌─────┬─────┬─────┬─────┐
│ a   ┆ d   ┆ c   ┆ b   │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ i64 ┆ i64 │
╞═════╪═════╪═════╪═════╡
│ 1   ┆ 4   ┆ 1   ┆ 7   │
└─────┴─────┴─────┴─────┘
>>> df[0, "a"]
1
>>> df["a"]
shape: (3,)
Series: 'a' [i64]
[
    1
    2
    3
]
>>> df[0:2]
shape: (2, 4)
┌─────┬─────┬─────┬─────┐
│ a   ┆ d   ┆ c   ┆ b   │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ i64 ┆ i64 │
╞═════╪═════╪═════╪═════╡
│ 1   ┆ 4   ┆ 1   ┆ 7   │
│ 2   ┆ 5   ┆ 3   ┆ 8   │
└─────┴─────┴─────┴─────┘
>>> df[0:2, "a"]
shape: (2,)
Series: 'a' [i64]
[
    1
    2
]
>>> df[0:2, 0]
shape: (2,)
Series: 'a' [i64]
[
    1
    2
]
>>> df[[0, 1], [0, 1, 2]]
shape: (2, 3)
┌─────┬─────┬─────┐
│ a   ┆ d   ┆ c   │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ i64 │
╞═════╪═════╪═════╡
│ 1   ┆ 4   ┆ 1   │
│ 2   ┆ 5   ┆ 3   │
└─────┴─────┴─────┘
>>> df[0:2, ["a", "c"]]
shape: (2, 2)
┌─────┬─────┐
│ a   ┆ c   │
│ --- ┆ --- │
│ i64 ┆ i64 │
╞═════╪═════╡
│ 1   ┆ 1   │
│ 2   ┆ 3   │
└─────┴─────┘
>>> df[:, 0:2]
shape: (3, 2)
┌─────┬─────┐
│ a   ┆ d   │
│ --- ┆ --- │
│ i64 ┆ i64 │
╞═════╪═════╡
│ 1   ┆ 4   │
│ 2   ┆ 5   │
│ 3   ┆ 6   │
└─────┴─────┘
>>> df[:, "a":"c"]
shape: (3, 3)
┌─────┬─────┬─────┐
│ a   ┆ d   ┆ c   │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ i64 │
╞═════╪═════╪═════╡
│ 1   ┆ 4   ┆ 1   │
│ 2   ┆ 5   ┆ 3   │
│ 3   ┆ 6   ┆ 2   │
└─────┴─────┴─────┘

arrange(*args)[source]¶

Arrange/sort rows

Parameters:: *args (str) – Columns to sort by

Examples

>>> df = tp.tibble({'x': ['a', 'a', 'b'], 'y': range(3)})
>>> # Arrange in ascending order
>>> df.arrange('x', 'y')
...
>>> # Arrange some columns descending
>>> df.arrange(tp.desc('x'), 'y')

as_dict(*, as_series=True)[source]¶

Aggregate data with summary statistics

Parameters:: as_series (bool) – If True - returns the dict values as Series If False - returns the dict values as lists

Examples

>>> df.to_dict()
>>> df.to_dict(as_series = False)

as_pandas()[source]¶

Convert to a pandas DataFrame

Examples

>>> df.as_pandas()

as_polars()[source]¶

Convert to a polars DataFrame

Examples

>>> df.as_polars()

bind_cols(*args)[source]¶

Bind data frames by columns

Parameters:: df (tibble) – Data frame to bind

Examples

>>> df1 = tp.tibble({'x': ['a', 'a', 'b'], 'y': range(3)})
>>> df2 = tp.tibble({'a': ['c', 'c', 'c'], 'b': range(4, 7)})
>>> df1.bind_cols(df2)

bind_rows(*args)[source]¶

Bind data frames by row

Parameters:: *args (tibble, list) – Data frames to bind by row

Examples

>>> df1 = tp.tibble({'x': ['a', 'a', 'b'], 'y': range(3)})
>>> df2 = tp.tibble({'x': ['c', 'c', 'c'], 'y': range(4, 7)})
>>> df1.bind_rows(df2)

clone()[source]¶: Very cheap deep clone

count(*args, sort=False, name='n')[source]¶

Returns row counts of the dataset. If bare column names are provided, count() returns counts by group.

Parameters:

*args (str, Expr) – Columns to group by
sort (bool) – Should columns be ordered in descending order by count
name (str) – The name of the new column in the output. If omitted, it will default to “n”.

Examples

>>> df = tp.tibble({'a': range(3), 'b': ['a', 'a', 'b']})
>>> df.count()
>>> df.count('b')

distinct(*args)[source]¶

Select distinct/unique rows

Parameters:: *args (str, Expr) – Columns to find distinct/unique rows

Examples

>>> df = tp.tibble({'a': range(3), 'b': ['a', 'a', 'b']})
>>> df.distinct()
>>> df.distinct('b')

drop(*args)[source]¶

Drop unwanted columns

Parameters:: *args (str) – Columns to drop

Examples

>>> df.drop('x', 'y')

drop_null(*args)[source]¶

Drop rows containing missing values

Parameters:: *args (str) – Columns to drop nulls from (defaults to all)

Examples

>>> df = tp.tibble(x = [1, None, 3], y = [None, 'b', 'c'], z = range(3)}
>>> df.drop_null()
>>> df.drop_null('x', 'y')

equals(other, null_equal=True)[source]¶: Check if two tibbles are equal

glimpse()[source]¶

Return a dense preview of the DataFrame.

The formatting shows one line per column so that wide dataframes display cleanly. Each line shows the column name, the data type, and the first few values.

fill(*args, direction='down', _by=None)[source]¶

Fill in missing values with previous or next value

Parameters:

*args (str) – Columns to fill
direction (str) – Direction to fill. One of [‘down’, ‘up’, ‘downup’, ‘updown’]
by (str, list) – Columns to group by

Examples

>>> df = tp.tibble({'a': [1, None, 3, 4, 5],
...                 'b': [None, 2, None, None, 5],
...                 'groups': ['a', 'a', 'a', 'b', 'b']})
>>> df.fill('a', 'b')
>>> df.fill('a', 'b', by = 'groups')
>>> df.fill('a', 'b', direction = 'downup')

filter(*args, _by=None)[source]¶

Filter rows on one or more conditions

Parameters:

*args (Expr) – Conditions to filter by
by (str, list) – Columns to group by

Examples

>>> df = tp.tibble({'a': range(3), 'b': ['a', 'a', 'b']})
>>> df.filter(col('a') < 2, col('b') == 'a')
>>> df.filter((col('a') < 2) & (col('b') == 'a'))
>>> df.filter(col('a') <= tp.mean(col('a')), by = 'b')

full_join(df, left_on=None, right_on=None, on=None, suffix: str = '_right')[source]¶

Perform an full join

Parameters:

df (tibble) – Lazy DataFrame to join with.
left_on (str, list) – Join column(s) of the left DataFrame.
right_on (str, list) – Join column(s) of the right DataFrame.
on (str, list) – Join column(s) of both DataFrames. If set, left_on and right_on should be None.
suffix (str) – Suffix to append to columns with a duplicate name.

Examples

>>> df1.full_join(df2)
>>> df1.full_join(df2, on = 'x')
>>> df1.full_join(df2, left_on = 'left_x', right_on = 'x')

head(n=5, *, _by=None)[source]¶: Alias for .slice_head()

inner_join(df, left_on=None, right_on=None, on=None, suffix='_right')[source]¶

Perform an inner join

Parameters:

df (tibble) – Lazy DataFrame to join with.
left_on (str, list) – Join column(s) of the left DataFrame.
right_on (str, list) – Join column(s) of the right DataFrame.
on (str, list) – Join column(s) of both DataFrames. If set, left_on and right_on should be None.
suffix (str) – Suffix to append to columns with a duplicate name.

Examples

>>> df1.inner_join(df2)
>>> df1.inner_join(df2, on = 'x')
>>> df1.inner_join(df2, left_on = 'left_x', right_on = 'x')

left_join(df, left_on=None, right_on=None, on=None, suffix='_right')[source]¶

Perform a left join

Parameters:

df (tibble) – Lazy DataFrame to join with.
left_on (str, list) – Join column(s) of the left DataFrame.
right_on (str, list) – Join column(s) of the right DataFrame.
on (str, list) – Join column(s) of both DataFrames. If set, left_on and right_on should be None.
suffix (str) – Suffix to append to columns with a duplicate name.

Examples

>>> df1.left_join(df2)
>>> df1.left_join(df2, on = 'x')
>>> df1.left_join(df2, left_on = 'left_x', right_on = 'x')

mutate(*args, _by=None, **kwargs)[source]¶

Add or modify columns

Parameters:

*args (Expr) – Column expressions to add or modify
by (str, list) – Columns to group by
**kwargs (Expr) – Column expressions to add or modify

Examples

>>> df = tp.tibble({'a': range(3), 'b': range(3), c = ['a', 'a', 'b']})
>>> df.mutate(double_a = col('a') * 2,
...           a_plus_b = col('a') + col('b'))
>>> df.mutate(row_num = row_number(), by = 'c')

pivot_longer(cols=everything(), names_to='name', values_to='value')[source]¶

Pivot data from wide to long

Parameters:

cols (Expr) – List of the columns to pivot. Defaults to all columns.
names_to (str) – Name of the new “names” column.
values_to (str) – Name of the new “values” column

Examples

>>> df = tp.tibble({'id': ['id1', 'id2'], 'a': [1, 2], 'b': [1, 2]})
>>> df.pivot_longer(cols = ['a', 'b'])
>>> df.pivot_longer(cols = ['a', 'b'], names_to = 'stuff', values_to = 'things')

pivot_wider(names_from='name', values_from='value', id_cols=None, values_fn='first', values_fill=None)[source]¶

Pivot data from long to wide

Parameters:

names_from (str) – Column to get the new column names from.
values_from (str) – Column to get the new column values from
id_cols (str, list) – A set of columns that uniquely identifies each observation. Defaults to all columns in the data table except for the columns specified in names_from and values_from.
values_fn (str) – Function for how multiple entries per group should be dealt with. Any of ‘first’, ‘count’, ‘sum’, ‘max’, ‘min’, ‘mean’, ‘median’, ‘last’
values_fill (str) – If values are missing/null, what value should be filled in. Can use: “backward”, “forward”, “mean”, “min”, “max”, “zero”, “one”

Examples

>>> df = tp.tibble({'id': [1, 1], 'variable': ['a', 'b'], 'value': [1, 2]})
>>> df.pivot_wider(names_from = 'variable', values_from = 'value')

print()[source]¶

pull(var=None)[source]¶

Extract a column as a series

Parameters:: var (str) – Name of the column to extract. Defaults to the last column.

Examples

>>> df = tp.tibble({'a': range(3), 'b': range(3))
>>> df.pull('a')

relocate(*args, _before=None, _after=None)[source]¶

Move a column or columns to a new position

Parameters:: *args (str, Expr) – Columns to move

Examples

>>> df = tp.tibble({'a': range(3), 'b': range(3), 'c': ['a', 'a', 'b']})
>>> df.relocate('a', before = 'c')
>>> df.relocate('b', after = 'c')

rename(_mapping=None, **kwargs)[source]¶

Rename columns

Parameters:

_mapping (dict) – Dictionary mapping of new names
**kwargs (str) – key-value pair of new name from old name

Examples

>>> df = tp.tibble({'x': range(3), 't': range(3), 'z': ['a', 'a', 'b']})
>>> df.rename(new_x = 'x') # dplyr interface
>>> df.rename({'x': 'new_x'}) # pandas interface

replace_null(replace=None)[source]¶

Replace null values

Parameters:: replace (dict) – Dictionary of column/replacement pairs

Examples

>>> df = tp.tibble(x = [0, None], y = [None, None])
>>> df.replace_null(dict(x = 1, y = 2))

separate(sep_col, into, sep='_', remove=True)[source]¶

Separate a character column into multiple columns

Parameters:

sep_col (str) – Column to split into multiple columns
into (list) – List of new column names
sep (str) – Separator to split on. Default to ‘_’
remove (bool) – If True removes the input column from the output data frame

Examples

>>> df = tp.tibble(x = ['a_a', 'b_b', 'c_c'])
>>> df.separate('x', into = ['left', 'right'])

set_names(nm=None)[source]¶

Change the column names of the data frame

Parameters:: nm (list) – A list of new names for the data frame

Examples

>>> df = tp.tibble(x = range(3), y = range(3))
>>> df.set_names(['a', 'b'])

select(*args)[source]¶

Select or drop columns

Parameters:: *args (str, Expr) – Columns to select

Examples

>>> df = tp.tibble({'a': range(3), 'b': range(3), 'c': ['a', 'a', 'b']})
>>> df.select('a', 'b')
>>> df.select(col('a'), col('b'))

slice(*args, _by=None)[source]¶

Grab rows from a data frame

Parameters:

*args (int, list) – Rows to grab
by (str, list) – Columns to group by

Examples

>>> df = tp.tibble({'a': range(3), 'b': range(3), 'c': ['a', 'a', 'b']})
>>> df.slice(0, 1)
>>> df.slice(0, by = 'c')

slice_head(n=5, *, _by=None)[source]¶

Grab top rows from a data frame

Parameters:

n (int) – Number of rows to grab
by (str, list) – Columns to group by

Examples

>>> df = tp.tibble({'a': range(3), 'b': range(3), 'c': ['a', 'a', 'b']})
>>> df.slice_head(2)
>>> df.slice_head(1, by = 'c')

slice_tail(n=5, *, _by=None)[source]¶

Grab bottom rows from a data frame

Parameters:

n (int) – Number of rows to grab
by (str, list) – Columns to group by

Examples

>>> df = tp.tibble({'a': range(3), 'b': range(3), 'c': ['a', 'a', 'b']})
>>> df.slice_tail(2)
>>> df.slice_tail(1, by = 'c')

summarise(*args, _by=None, **kwargs)[source]¶: Alias for .summarize()

summarize(*args, _by=None, **kwargs)[source]¶

Aggregate data with summary statistics

Parameters:

*args (Expr) – Column expressions to add or modify
by (str, list) – Columns to group by
**kwargs (Expr) – Column expressions to add or modify

Examples

>>> df = tp.tibble({'a': range(3), 'b': range(3), 'c': ['a', 'a', 'b']})
>>> df.summarize(avg_a = tp.mean(col('a')))
>>> df.summarize(avg_a = tp.mean(col('a')),
...              by = 'c')
>>> df.summarize(avg_a = tp.mean(col('a')),
...              max_b = tp.max(col('b')))

tail(n=5, *, _by=None)[source]¶: Alias for .slice_tail()

unite(col='_united', unite_cols=[], sep='_', remove=True)[source]¶

Unite multiple columns by pasting strings together

Parameters:

col (str) – Name of the new column
unite_cols (list) – List of columns to unite
sep (str) – Separator to use between values
remove (bool) – If True removes input columns from the data frame

Examples

>>> df = tp.tibble(a = ["a", "a", "a"], b = ["b", "b", "b"], c = range(3))
>>> df.unite("united_col", unite_cols = ["a", "b"])

write_csv(file=None, has_headers=True, sep=',')[source]¶: Write a data frame to a csv

write_parquet(file=str, compression='snappy', use_pyarrow=False, **kwargs)[source]¶: Write a data frame to a parquet

property names¶

Get column names

Examples

>>> df.names

property ncol¶

Get number of columns

Examples

>>> df.ncol

property nrow¶

Get number of rows

Examples

>>> df.nrow

property plot¶

Access to polars plotting

Examples

>>> df.plot

desc(x)[source]¶: Mark a column to order in descending

as_tibble(x)[source]¶

Convert an object to a tibble

Parameters:: x ([pl.DataFrame, pd.DataFrame, dict]) – Object to convert to a tibble

Examples

>>> tp.as_tibble(polars_df)

is_tibble(x)[source]¶

Is an object to a tibble

Parameters:: x (object)

Examples

>>> tp.is_tibble(df)

from_polars(df)[source]¶

Convert from polars DataFrame to tibble

Parameters:: df (DataFrame) – pl.DataFrame to convert to a tibble

Examples

>>> tp.from_polars(df)

from_pandas(df)[source]¶

Convert from pandas DataFrame to tibble

Parameters:: df (DataFrame) – pd.DataFrame to convert to a tibble

Examples

>>> tp.from_pandas(df)