datapad.Sequence¶

class
datapad.
Sequence
(_iterable=None)¶ The core object in datapad used to wrap sequencelike data types in a fluentstyle API.

__init__
(_iterable=None)¶ Instantiates a new Sequence object.
Parameters: _iterable (List, Set, Tuple, Iterator) – Any object that conforms to the Iterable API.
Methods
__init__
([_iterable])Instantiates a new Sequence object. all
()Returns a standard python iterator that you can use to lazily iterate over your sequence of data batch
(size)Lazily combines elements in sequence into a list of length size. cache
([overwrite])Greedily stores results of self.collect() in an internal variable that can later be used to reset the Sequence to the beginning of the iterator. collect
()Eagerly returns all elements in sequence concat
(seq)Concatenates another sequence to the end of this sequence count
([distinct])Eagerly count number of elements in sequence distinct
()Eagerly returns a new sequence with unique values drop
(count)Lazily skip or drop over count elements. drop_if
(fn)Lazily apply fn function to every element of iterable and drop sequence elements where the function fn evaluates to True. filter
(fn)This is an alias for the Sequence.keep_if function first
()Eagerly returns first element in sequence flatmap
(fn)Lazily apply fn function to every element of iterable and chain the output into a single flattend sequence. groupby
([key, getter, eager_group])Groups sequence using key function, join
(other[, key, other_key])Joins two sequences based on common field matches between the sequence and other
.keep_if
(fn)Lazily apply fn function to every element of iterable and keep only sequence elements where the function fn evaluates to True. map
(fn)Lazily apply fn function to every element of iterable next
()Eagerly returns next element in sequence (alias for first() function) peek
([count])Returns list of count elements without advancing sequence iterator. pmap
(fn[, workers, ordered])Lazily apply fn function to every element of iterable, in parallel using multiprocess.dummy.Pool . reduce
(fn[, initial])Eagerly apply a function of two arguments cumulatively to the items of a sequence, from left to right, so as to reduce the sequence to a single value. reset
()Uses the internal cache of the Sequence to reset to beginning of iterator shuffle
()Eagerly shuffles your sequence and returns a newly created sequence containing the shuffled items. sort
([key])Eagerly sorts your sequence and returns a newly created sequence containing the sorted items. take
(count)Lazily returns a sequence of the first count elements. window
(size[, stride])Lazily slides and yields a window of length size over sequence. zip_with_index
()Add an to each item in sequence 
all
()¶ Returns a standard python iterator that you can use to lazily iterate over your sequence of data
>>> seq = Sequence(range(10)) >>> seq = seq.map(lambda v: v*2) >>> i = 0 >>> for item in seq.all(): ... i += item >>> i 90

batch
(size)¶ Lazily combines elements in sequence into a list of length size. This function will drop any remainder if the sequence ends before a batch with size has been created.
Parameters: size (int) – The batch size. Examples
>>> seq = Sequence(range(10)) >>> seq.batch(3).collect() [[0, 1, 2], [3, 4, 5], [6, 7, 8]]

cache
(overwrite=False)¶ Greedily stores results of self.collect() in an internal variable that can later be used to reset the Sequence to the beginning of the iterator. This is useful if you want to make multiple passes over the data. This function is meant to be used in conjunction with reset().
Parameters: overwrite (bool) – By default multiple calls to the cache function will only cache the initial state of the iterator. If you set overwrite to True, this will take the current state of the iterator and use it’s results to save it to the internal cache. >>> seq = Sequence([1,2,3]) >>> _ = seq.cache() >>> seq.collect() [1, 2, 3] >>> _ = seq.reset() >>> seq.collect() [1, 2, 3] >>> _ = seq.reset() >>> seq.next() 1 >>> _ = seq.cache(overwrite=True) >>> seq.collect() [2, 3] >>> _ = seq.reset() >>> seq.collect() [2, 3]

collect
()¶ Eagerly returns all elements in sequence
>>> seq = Sequence(range(10)) >>> seq.collect() [0, 1, 2, 3, 4, 5, 6, 7, 8, 9] >>> seq.collect() []

concat
(seq)¶ Concatenates another sequence to the end of this sequence
Examples
Concat two sequences together:
>>> s1 = Sequence(['a', 'b', 'c']) >>> s2 = Sequence(range(3)) >>> s3 = s2.concat(s1) >>> s3.collect() [0, 1, 2, 'a', 'b', 'c']
Concat sequence with itself:
>>> seq = Sequence(range(5)) >>> seq = seq.concat(seq) >>> seq.collect() [0, 1, 2, 3, 4, 0, 1, 2, 3, 4]

count
(distinct=False)¶ Eagerly count number of elements in sequence
Parameters: distinct – bool If True, counts occurence of each distinct value in sequence. Returns: Either an integer count or a new sequence of tuples where the first value is the unique element and the second value is the number of times that element appeared in the sequence. >>> seq = Sequence(range(5)) >>> seq.count() 5
>>> seq = Sequence(['a', 'a', 'b', 'b', 'c', 'c']) >>> seq.count(distinct=True).collect() [('a', 2), ('b', 2), ('c', 2)]

distinct
()¶ Eagerly returns a new sequence with unique values
>>> seq = Sequence(['a', 'a', 'b', 'b', 'c', 'c']) >>> seq.distinct().collect() ['a', 'b', 'c']

drop
(count)¶ Lazily skip or drop over count elements.
>>> seq = Sequence(range(5)) >>> seq.collect() [0, 1, 2, 3, 4]
>>> seq = Sequence(range(5)) >>> seq = seq.drop(2) >>> seq.collect() [2, 3, 4]

drop_if
(fn)¶ Lazily apply fn function to every element of iterable and drop sequence elements where the function fn evaluates to True.
Parameters: fn – function Function with signature fn(element) > bool to apply to every element of sequence. Drop all elements in the sequence where the fn function evaluates to True. >>> seq = Sequence(range(5)) >>> seq = seq.drop_if(lambda v: v > 1) >>> seq.collect() [0, 1]

filter
(fn)¶ This is an alias for the Sequence.keep_if function
>>> seq = Sequence(range(5)) >>> seq = seq.filter(lambda v: v > 1) >>> seq.collect() [2, 3, 4]

first
()¶ Eagerly returns first element in sequence
Examples
Get first value in sequence:
>>> seq = Sequence(range(5)) >>> seq.first() 0 >>> seq.first() 1
Calling first on empty sequence returns None:
>>> seq = Sequence([]) >>> seq.first()

flatmap
(fn)¶ Lazily apply fn function to every element of iterable and chain the output into a single flattend sequence.
Parameters: fn (function) – Function with signature fn(element) > iterable(element) to apply to every element of sequence. Examples
>>> seq = Sequence(range(5)) >>> seq = seq.flatmap(lambda v: [v,v]) >>> seq.collect() [0, 0, 1, 1, 2, 2, 3, 3, 4, 4]

groupby
(key=None, getter=None, eager_group=True)¶ Groups sequence using key function,
Note: you must ensure elements are sorted by groups before calling this function.
Parameters:  key – function Function used to determine what to use as a key for grouping
 getter – function Function to be applied to each element of a group
 eager_group – bool, default=True If true, eagerly convert a group from a lazy Sequence to a fullyrealized list.
Examples
Simple usage:
>>> from pprint import pprint >>> seq = Sequence(['a', 'b', 'c', 'd', 'a', 'b', 'a', 'd']) >>> res = seq.sort().groupby(key=lambda x: x).collect() >>> res == [ ... ('a', ['a', 'a', 'a']), ... ('b', ['b', 'b']), ... ('c', ['c']), ... ('d', ['d', 'd']), ... ] True
Grouping with getter function:
>>> things = [("animal", "lion"), ... ("plant", "maple tree"), ... ("animal", "walrus"), ... ("plant", "grass")] >>> seq = Sequence(things) >>> res = seq.sort().groupby(key=lambda x: x[0], getter=lambda x: x[1]).collect() >>> res == [ ... ('animal', ['lion', 'walrus']), ... ('plant', ['grass', 'maple tree']) ... ] True

join
(other, key=None, other_key=None)¶ Joins two sequences based on common field matches between the sequence and
other
. This is known as an “inner” join in SQL terminology.Parameters:  other (Sequence) – A Sequence to join with the calling sequence.
 key (function) – A function to retrieve the field to be used for matching between the
two sequence. If key is None, then
key
will default tolambda x: x
.  other_key (function) – A function to retrieve the field in
other
to be used for matching between the two sequence. Ifother_key
is None, usekey
.
Returns: A sequence of 2tuples
(a, b)
wherea
is an element inself
that matched elementb
inother
(based on the given field keys).Examples
>>> a = Sequence([ ... {'id': 1, 'name': 'John'}, ... {'id': 2, 'name': 'Nayeon'}, ... {'id': 3, 'name': 'Reza'} ... ]) >>> b = Sequence([ ... {'id': 1, 'age': 2}, ... {'id': 2, 'age': 3} ... ]) >>> res = a.join(b, key=lambda x: x['id']).collect() >>> res == [ ... ({'id': 1, 'name': 'John'}, {'id': 1, 'age': 2}), ... ({'id': 2, 'name': 'Nayeon'}, {'id': 2, 'age': 3}) ... ] True

keep_if
(fn)¶ Lazily apply fn function to every element of iterable and keep only sequence elements where the function fn evaluates to True.
Parameters: fn – function Function with signature fn(element) > bool to apply to every element of sequence. Keep all elements in the sequence where the fn function evaluates to True. >>> seq = Sequence(range(5)) >>> seq = seq.keep_if(lambda v: v > 1) >>> seq.collect() [2, 3, 4]

map
(fn)¶ Lazily apply fn function to every element of iterable
Parameters: fn (function) – Function with signature fn(element) to apply to every element of sequence. >>> seq = Sequence(range(10)) >>> seq = seq.map(lambda v: v*2) >>> seq.collect() [0, 2, 4, 6, 8, 10, 12, 14, 16, 18]

next
()¶ Eagerly returns next element in sequence (alias for first() function)
Examples:
Get next value in sequence:
>>> seq = Sequence(range(5)) >>> seq.next() 0 >>> seq.next() 1
Calling next on empty sequence returns None:
>>> seq = Sequence([]) >>> seq.next()

peek
(count=None)¶ Returns list of count elements without advancing sequence iterator. If count is None, return only the first element.
WARNING: this function will load up to count elements of your sequence into memory.
Examples
Peek at first element (notice iterator does not advance):
>>> seq = Sequence(range(10)) >>> seq.peek() 0 >>> seq.peek() 0
Peek at first 3 elements:
>>> seq = Sequence(range(10)) >>> seq.peek(3) [0, 1, 2] >>> seq.peek(3) [0, 1, 2]

pmap
(fn, workers=3, ordered=True)¶ Lazily apply fn function to every element of iterable, in parallel using multiprocess.dummy.Pool . The returned sequence may appear in a different order than the input sequence if you set ordered to False
THIS FUNCTION IS EXPERIMENTAL
Parameters:  fn (function) – Function with signature fn(element) > element to apply to every element of sequence.
 workers (int) – Number of parallel workers to use (default: 3). These workers are implemented as python threads.
 ordered (bool) – Whether to yield results in the same order in which items arrive. You may get better performance by setting this to false (default: True).
>>> seq = Sequence(range(10)) >>> seq = seq.pmap(lambda v: v*2) >>> seq.collect() [0, 2, 4, 6, 8, 10, 12, 14, 16, 18]
>>> seq = Sequence(range(10)) >>> seq = seq.pmap(lambda v: v*2, workers=1, ordered=False) >>> seq.collect() [0, 2, 4, 6, 8, 10, 12, 14, 16, 18]

reduce
(fn, initial=None)¶ Eagerly apply a function of two arguments cumulatively to the items of a sequence, from left to right, so as to reduce the sequence to a single value. For example, reduce(lambda x, y: x+y, [1, 2, 3, 4, 5]) calculates ((((1+2)+3)+4)+5). If initial is present, it is placed before the items of the sequence in the calculation, and serves as a default when the sequence is empty.
Parameters:  fn (function) – Function with signature fn(acc, current_item) > acc_next
 initial (Any) – An initial value that acc will be set to. If not provided, this function will set the first element of the sequence as the initial value.
Examples
Reduce with accumulator initialized to first element:
>>> seq = Sequence(range(3)) >>> seq.reduce(lambda acc, item: acc + item) 3
Reduce with accumulator set to a custom initial value:
>>> seq = Sequence(range(3)) >>> seq.reduce(lambda acc, item: acc + item, initial=10) 13

reset
()¶ Uses the internal cache of the Sequence to reset to beginning of iterator
>>> seq = Sequence([1, 2, 3]) >>> _ = seq.cache() >>> seq.collect() [1, 2, 3] >>> _ = seq.reset() >>> seq.collect() [1, 2, 3]

shuffle
()¶ Eagerly shuffles your sequence and returns a newly created sequence containing the shuffled items. WARNING: this function loads the entirety of your sequence into memory.
>>> import random >>> random.seed(0) >>> seq = Sequence(range(5)) >>> seq.shuffle().collect() [2, 1, 0, 4, 3]

sort
(key=None)¶ Eagerly sorts your sequence and returns a newly created sequence containing the sorted items. WARNING: this function loads the entirety of your sequence into memory.
>>> seq = Sequence([2, 1, 0, 4, 3]) >>> seq.sort().collect() [0, 1, 2, 3, 4]

take
(count)¶ Lazily returns a sequence of the first count elements.
>>> seq = Sequence(range(5)) >>> seq.take(2).collect() [0, 1]

window
(size, stride=1)¶ Lazily slides and yields a window of length size over sequence. This function will drop any remainder if the sequence ends before a window with size has been filled.
Parameters:  size (int) – The window size.
 stride (int) – How many elements to skip for each advancement in window position.
Examples
>>> seq = Sequence(range(10)) >>> seq.window(2).collect() [[0, 1], [1, 2], [2, 3], [3, 4], [4, 5], [5, 6], [6, 7], [7, 8], [8, 9]]
>>> seq = Sequence(range(10)) >>> seq.window(3, stride=2).collect() [[0, 1, 2], [2, 3, 4], [4, 5, 6], [6, 7, 8]]
>>> seq = Sequence(range(10)) >>> seq.window(2, stride=4).collect() [[0, 1], [4, 5], [8, 9]]
>>> seq = Sequence(range(10)) >>> seq.window(1, stride=1).collect() [[0], [1], [2], [3], [4], [5], [6], [7], [8], [9]]

zip_with_index
()¶ Add an to each item in sequence
>>> seq = Sequence(['a', 'b', 'c']) >>> seq.zip_with_index().collect() [(0, 'a'), (1, 'b'), (2, 'c')]
