datapad.io

Convenience functions for creating Sequences from files and other input sources.

Functions

read_csv(path_or_paths) Construct a Sequence from json text files
read_json(path_or_paths[, lines, ignore_errors]) Construct a Sequence from json text files
read_text(path_or_paths[, lines]) Construct a Sequence from text files
datapad.io.read_csv(path_or_paths)

Construct a Sequence from json text files

Parameters:path_or_paths – str or list of strings A path, or list of paths. Paths may contain glob patterns like “data/metadata-*-a.txt”
>>> seq = read_csv(["data/meta-*.csv"])
>>> seq.collect() # doctest: +SKIP
[["foo", "bar"],
 ["1", "2"],
 ["3", "4"]]
datapad.io.read_json(path_or_paths, lines=True, ignore_errors=False)

Construct a Sequence from json text files

Parameters:
  • path_or_paths – str or list of strings A path, or list of paths. Paths may contain glob patterns like “data/metadata-*-a.txt”
  • lines – bool (default: True) If True, each element of the sequence comes from decoding a line in the json-lines text file (see: http://jsonlines.org/examples/). If False, each element in sequence is obtained by running json.loads on the entire contents of the text file .
  • ignore_errors – bool If True, ignore and skip over any elements that present json load errors
>>> seq = read_json(["data/meta-*.json"], lines=True)
>>> seq.collect() # doctest: +SKIP
[{"dog": 1}, {"dog": 2}]
datapad.io.read_text(path_or_paths, lines=True)

Construct a Sequence from text files

Parameters:
  • path_or_paths – str or list of strings A path, or list of paths. Paths may contain glob patterns like “data/metadata-*-a.txt”
  • lines – bool (default: True) If True, each element of the sequence comes from reading a line in the text file. If False, each element in sequence comes from the entire text file.
>>> seq = read_text(["data/meta-*.txt"], lines=True)
>>> seq.collect() # doctest: +SKIP
["foo_a", "foo_b", "bar_a", "bar_b"]
>>> seq = read_text(["data/meta-*.txt"], lines=False)
>>> seq.collect() # doctest: +SKIP
["foo_a\nfoo_b", "bar_a\nbar_b"]