google.datalab.bigquery Module

Google Cloud Platform library - BigQuery Functionality.

class google.datalab.bigquery.CSVOptions(delimiter=u', ', skip_leading_rows=0, encoding=u'utf-8', quote=u'"', allow_quoted_newlines=False, allow_jagged_rows=False)[source]

Initialize an instance of CSV options.

Parameters:
  • delimiter – The separator for fields in a CSV file. BigQuery converts the string to ISO-8859-1 encoding, and then uses the first byte of the encoded string to split the data as raw binary (default ‘,’).
  • skip_leading_rows – A number of rows at the top of a CSV file to skip (default 0).
  • encoding – The character encoding of the data, either ‘utf-8’ (the default) or ‘iso-8859-1’.
  • quote – The value used to quote data sections in a CSV file; default ‘”’. If your data does not contain quoted sections, set the property value to an empty string. If your data contains quoted newline characters, you must also enable allow_quoted_newlines.
  • allow_quoted_newlines – If True, allow quoted data sections in CSV files that contain newline characters (default False).
  • allow_jagged_rows – If True, accept rows in CSV files that are missing trailing optional columns; the missing values are treated as nulls (default False).
class google.datalab.bigquery.Dataset(name, context=None)[source]

Represents a list of BigQuery tables in a dataset.

Initializes an instance of a Dataset.

Parameters:
  • name – the name of the dataset, as a string or (project_id, dataset_id) tuple.
  • context – an optional Context object providing project_id and credentials. If a specific project id or credentials are unspecified, the default ones configured at the global level are used.
Raises:

Exception if the name is invalid.

create(friendly_name=None, description=None)[source]

Creates the Dataset with the specified friendly name and description.

Parameters:
  • friendly_name – (optional) the friendly name for the dataset if it is being created.
  • description – (optional) a description for the dataset if it is being created.
Returns:

The Dataset.

Raises:

Exception if the Dataset could not be created.

delete(delete_contents=False)[source]

Issues a request to delete the dataset.

Parameters:delete_contents – if True, any tables and views in the dataset will be deleted. If False and the dataset is non-empty an exception will be raised.
Returns:None on success.
Raises:Exception if the delete fails (including if table was nonexistent).
description

The description of the dataset, if any.

Raises:Exception if the dataset exists but the metadata for the dataset could not be retrieved.
exists()[source]

Checks if the dataset exists.

Returns:True if the dataset exists; False otherwise.
Raises:Exception if the dataset exists but the metadata for the dataset could not be retrieved.
friendly_name

The friendly name of the dataset, if any.

Raises:Exception if the dataset exists but the metadata for the dataset could not be retrieved.
name

The DatasetName named tuple (project_id, dataset_id) for the dataset.

tables()[source]

Returns an iterator for iterating through the Tables in the dataset.

update(friendly_name=None, description=None)[source]

Selectively updates Dataset information.

Parameters:
  • friendly_name – if not None, the new friendly name.
  • description – if not None, the new description.

Returns:

views()[source]

Returns an iterator for iterating through the Views in the dataset.

class google.datalab.bigquery.DatasetName(project_id, dataset_id)

A namedtuple for Dataset names.

Parameters:
  • project_id – the project id for the dataset.
  • dataset_id – the dataset id for the dataset.
dataset_id

Alias for field number 1

project_id

Alias for field number 0

class google.datalab.bigquery.Datasets(context=None)[source]

Iterator class for enumerating the datasets in a project.

Initialize the Datasets object.

Parameters:context – an optional Context object providing project_id and credentials. If a specific project id or credentials are unspecified, the default ones configured at the global level are used.
class google.datalab.bigquery.ExternalDataSource(source, source_format=u'csv', csv_options=None, ignore_unknown_values=False, max_bad_records=0, compressed=False, schema=None)[source]

Create an external table for a GCS object.

Parameters:
  • source – the URL of the source objects(s). Can include a wildcard ‘*’ at the end of the item name. Can be a single source or a list.
  • source_format – the format of the data, ‘csv’ or ‘json’; default ‘csv’.
  • csv_options – For CSV files, the options such as quote character and delimiter.
  • ignore_unknown_values – If True, accept rows that contain values that do not match the schema; the unknown values are ignored (default False).
  • max_bad_records – The maximum number of bad records that are allowed (and ignored) before returning an ‘invalid’ error in the Job result (default 0).
  • compressed – whether the data is GZ compressed or not (default False). Note that compressed data can be used as an external data source but cannot be loaded into a BQ Table.
  • schema – the schema of the data. This is required for this table to be used as an external data source or to be loaded using a Table object that itself has no schema (default None).
class google.datalab.bigquery.Query(sql, env=None, udfs=None, data_sources=None, subqueries=None)[source]

Represents a Query object that encapsulates a BigQuery SQL query.

This object can be used to execute SQL queries and retrieve results.

Initializes an instance of a Query object.

Parameters:
  • sql – the BigQuery SQL query string to execute
  • env – a dictionary containing objects from the query execution context, used to get references to UDFs, subqueries, and external data sources referenced by the query
  • udfs – list of UDFs names referenced in the SQL, or dictionary of names and UDF objects
  • data_sources – list of external data sources names referenced in the SQL, or dictionary of names and data source objects
  • subqueries – list of subqueries names referenced in the SQL, or dictionary of names and Query objects
Raises:

Exception if expansion of any variables failed.

data_sources

Get a dictionary of external data sources referenced by the query.

dry_run(context=None, query_params=None)[source]

Dry run a query, to check the validity of the query and return some useful statistics.

Parameters:
  • context – an optional Context object providing project_id and credentials. If a specific project id or credentials are unspecified, the default ones configured at the global level are used.
  • query_params – a dictionary containing query parameter types and values, passed to BigQuery.
Returns:

A dict with ‘cacheHit’ and ‘totalBytesProcessed’ fields.

Raises:

An exception if the query was malformed.

execute(output_options=None, sampling=None, context=None, query_params=None)[source]

Initiate the query and return a QueryJob.

Parameters:
  • output_options – a QueryOutput object describing how to execute the query
  • sampling – sampling function to use. No sampling is done if None. See bigquery.Sampling
  • context – an optional Context object providing project_id and credentials. If a specific project id or credentials are unspecified, the default ones configured at the global level are used.
Returns:

A Job object that can be used to get the query results, or export to a file or dataframe

Raises:

Exception if query could not be executed.

execute_async(output_options=None, sampling=None, context=None, query_params=None)[source]

Initiate the query and return a QueryJob.

Parameters:
  • output_options – a QueryOutput object describing how to execute the query
  • sampling – sampling function to use. No sampling is done if None. See bigquery.Sampling
  • context – an optional Context object providing project_id and credentials. If a specific project id or credentials are unspecified, the default ones configured at the global level are used.
  • query_params – a dictionary containing query parameter types and values, passed to BigQuery.
Returns:

A Job object that can wait on creating a table or exporting to a file If the output is a table, the Job object additionally has run statistics and query results

Raises:

Exception if query could not be executed.

static from_table(table, fields=None)[source]

Return a Query for the given Table object

Parameters:
  • table – the Table object to construct a Query out of
  • fields – the fields to return. If None, all fields will be returned. This can be a string which will be injected into the Query after SELECT, or a list of field names.
Returns:

A Query object that will return the specified fields from the records in the Table.

static from_view(view)[source]

Return a Query for the given View object

Parameters:view – the View object to construct a Query out of
Returns:A Query object with the same sql as the given View object
sql

Get the SQL for the query.

subqueries

Get a dictionary of subqueries referenced by the query.

udfs

Get a dictionary of UDFs referenced by the query.

class google.datalab.bigquery.QueryOutput[source]

Create a BigQuery output type object. Do not call this directly; use factory methods.

static dataframe(start_row=0, max_rows=None, use_cache=True)[source]

Construct a query output object where the result is a dataframe

Parameters:
  • start_row – the row of the table at which to start the export (default 0).
  • max_rows – an upper limit on the number of rows to export (default None).
  • use_cache – whether to use cached results or not (default True).
static file(path, format='csv', csv_delimiter=', ', csv_header=True, compress=False, use_cache=True)[source]

Construct a query output object where the result is either a local file or a GCS path

Note that there are two jobs that may need to be run sequentially, one to run the query, and the second to extract the resulting table. These are wrapped by a single outer Job.

If the query has already been executed and you would prefer to get a Job just for the extract, you can can call extract[_async] on the QueryResultsTable returned by the query

Parameters:
  • path – the destination path. Can either be a local or GCS URI (starting with gs://)
  • format – the format to use for the exported data; one of ‘csv’, ‘json’, or ‘avro’ (default ‘csv’).
  • csv_delimiter – for CSV exports, the field delimiter to use (default ‘,’).
  • csv_header – for CSV exports, whether to include an initial header line (default True).
  • compress – whether to compress the data on export. Compression is not supported for AVRO format (default False). Applies only to GCS URIs.
  • use_cache – whether to use cached results or not (default True).
static table(name=None, mode='create', use_cache=True, priority='interactive', allow_large_results=False)[source]

Construct a query output object where the result is a table

Parameters:
  • name – the result table name as a string or TableName; if None (the default), then a temporary table will be used.
  • table_mode – one of ‘create’, ‘overwrite’ or ‘append’. If ‘create’ (the default), the request will fail if the table exists.
  • use_cache – whether to use past query results or ignore cache. Has no effect if destination is specified (default True).
  • priority – one of ‘batch’ or ‘interactive’ (default). ‘interactive’ jobs should be scheduled to run quickly but are subject to rate limits; ‘batch’ jobs could be delayed by as much as three hours but are not rate-limited.
  • allow_large_results – whether to allow large results; i.e. compressed data over 100MB. This is slower and requires a name to be specified) (default False).
class google.datalab.bigquery.QueryResultsTable(name, context, job, is_temporary=False)[source]

A subclass of Table specifically for Query results.

The primary differences are the additional properties job_id and sql.

Initializes an instance of a Table object.

Parameters:
  • name – the name of the table either as a string or a 3-part tuple (projectid, datasetid, name).
  • context – an optional Context object providing project_id and credentials. If a specific project id or credentials are unspecified, the default ones configured at the global level are used.
  • job – the QueryJob associated with these results.
  • is_temporary – if True, this is a short-lived table for intermediate results (default False).
is_temporary

Whether this is a short-lived table or not.

job

The QueryJob object that caused the table to be populated.

job_id

The ID of the query job that caused the table to be populated.

sql

The SQL statement for the query that populated the table.

class google.datalab.bigquery.QueryStats(total_bytes, is_cached)[source]

A wrapper for statistics returned by a dry run query. Useful so we can get an HTML representation in a notebook.

class google.datalab.bigquery.Sampling[source]

Provides common sampling strategies.

Sampling strategies can be used for sampling tables or queries.

They are implemented as functions that take in a SQL statement representing the table or query that should be sampled, and return a new SQL statement that limits the result set in some manner.

static default(fields=None, count=5)[source]

Provides a simple default sampling strategy which limits the result set by a count.

Parameters:
  • fields – an optional list of field names to retrieve.
  • count – optional number of rows to limit the sampled results to.
Returns:

A sampling function that can be applied to get a random sampling.

static hashed(field_name, percent, fields=None, count=0)[source]

Provides a sampling strategy based on hashing and selecting a percentage of data.

Parameters:
  • field_name – the name of the field to hash.
  • percent – the percentage of the resulting hashes to select.
  • fields – an optional list of field names to retrieve.
  • count – optional maximum count of rows to pick.
Returns:

A sampling function that can be applied to get a hash-based sampling.

static random(percent, fields=None, count=0)[source]

Provides a sampling strategy that picks a semi-random set of rows.

Parameters:
  • percent – the percentage of the resulting hashes to select.
  • fields – an optional list of field names to retrieve.
  • count – maximum number of rows to limit the sampled results to (default 5).
Returns:

A sampling function that can be applied to get some random rows. In order for this to provide a good random sample percent should be chosen to be ~count/#rows where #rows is the number of rows in the object (query, view or table) being sampled. The rows will be returned in order; i.e. the order itself is not randomized.

static sorted(field_name, ascending=True, fields=None, count=5)[source]

Provides a sampling strategy that picks from an ordered set of rows.

Parameters:
  • field_name – the name of the field to sort the rows by.
  • ascending – whether to sort in ascending direction or not.
  • fields – an optional list of field names to retrieve.
  • count – optional number of rows to limit the sampled results to.
Returns:

A sampling function that can be applied to get the initial few rows.

class google.datalab.bigquery.Schema(definition=None)[source]

Represents the schema of a BigQuery table as a flattened list of objects representing fields.

Each field object has name, type, mode and description properties. Nested fields get flattened with their full-qualified names. So a Schema that has an object A with nested field B will be represented as [(name: ‘A’, ...), (name: ‘A.b’, ...)].

Initializes a Schema from its raw JSON representation, a Pandas Dataframe, or a list.

Parameters:definition – a definition of the schema as a list of dictionaries with ‘name’ and ‘type’ entries and possibly ‘mode’ and ‘description’ entries. Only used if no data argument was provided. ‘mode’ can be ‘NULLABLE’, ‘REQUIRED’ or ‘REPEATED’. For the allowed types, see: https://cloud.google.com/bigquery/preparing-data-for-bigquery#datatypes
find(name)[source]

Get the index of a field in the flattened list given its (fully-qualified) name.

Parameters:name – the fully-qualified name of the field.
Returns:The index of the field, if found; else -1.
static from_data(source)[source]
Infers a table/view schema from its JSON representation, a list of records, or a Pandas
dataframe.
Parameters:source

the Pandas Dataframe, a dictionary representing a record, a list of heterogeneous data (record) or homogeneous data (list of records) from which to infer the schema, or a definition of the schema as a list of dictionaries with ‘name’ and ‘type’ entries and possibly ‘mode’ and ‘description’ entries. Only used if no data argument was provided. ‘mode’ can be ‘NULLABLE’, ‘REQUIRED’ or ‘REPEATED’. For the allowed types, see: https://cloud.google.com/bigquery/preparing-data-for-bigquery#datatypes

Note that there is potential ambiguity when passing a list of lists or a list of dicts between whether that should be treated as a list of records or a single record that is a list. The heuristic used is to check the length of the entries in the list; if they are equal then a list of records is assumed. To avoid this ambiguity you can instead use the Schema.from_record method which assumes a single record, in either list of values or dictionary of key-values form.

Returns:A Schema for the data.
static from_record(source)[source]

Infers a table/view schema from a single record that can contain a list of fields or a dictionary of fields. The type of the elements is used for the types in the schema. For a dict, key names are used for column names while for a list, the field names are simply named ‘Column1’, ‘Column2’, etc. Note that if using a dict you may want to use an OrderedDict to ensure column ordering is deterministic.

Parameters:source – The list of field values or dictionary of key/values.
Returns:A Schema for the data.
class google.datalab.bigquery.SchemaField(name, type, mode=u'NULLABLE', description=u'')[source]

Represents a single field in a Table schema.

This has the properties:

  • name: the flattened, full-qualified name of the field.
  • type: the type of the field as a string (‘INTEGER’, ‘BOOLEAN’, ‘FLOAT’, ‘STRING’
    or ‘TIMESTAMP’).
  • mode: the mode of the field; ‘NULLABLE’ by default.
  • description: a description of the field, if known; empty string by default.
class google.datalab.bigquery.Table(name, context=None)[source]

Represents a Table object referencing a BigQuery table.

Initializes an instance of a Table object. The Table need not exist yet.

Parameters:
  • name – the name of the table either as a string or a 3-part tuple (projectid, datasetid, name). If a string, it must have the form ‘<project>:<dataset>.<table>’ or ‘<dataset>.<table>’.
  • context – an optional Context object providing project_id and credentials. If a specific project id or credentials are unspecified, the default ones configured at the global level are used.
Raises:

Exception if the name is invalid.

create(schema, overwrite=False)[source]

Create the table with the specified schema.

Parameters:
  • schema – the schema to use to create the table. Should be a list of dictionaries, each containing at least a pair of entries, ‘name’ and ‘type’. See https://cloud.google.com/bigquery/docs/reference/v2/tables#resource
  • overwrite – if True, delete the table first if it exists. If False and the table exists, creation will fail and raise an Exception.
Returns:

The Table instance.

Raises:

Exception if the table couldn’t be created or already exists and truncate was False.

delete()[source]

Delete the table.

Returns:True if the Table no longer exists; False otherwise.
exists()[source]

Checks if the table exists.

Returns:True if the table exists; False otherwise.
Raises:Exception if there was an error requesting information about the table.
extract(destination, format=u'csv', csv_delimiter=u', ', csv_header=True, compress=False)[source]

Exports the table to GCS; blocks until complete.

Parameters:
  • destination – the destination URI(s). Can be a single URI or a list.
  • format – the format to use for the exported data; one of ‘csv’, ‘json’, or ‘avro’ (default ‘csv’).
  • csv_delimiter – for CSV exports, the field delimiter to use. Defaults to ‘,’
  • csv_header – for CSV exports, whether to include an initial header line. Default true.
  • compress – whether to compress the data on export. Compression is not supported for AVRO format. Defaults to False.
Returns:

A Job object for the completed export Job if it was started successfully; else None.

extract_async(destination, format=u'csv', csv_delimiter=u', ', csv_header=True, compress=False)[source]

Starts a job to export the table to GCS.

Parameters:
  • destination – the destination URI(s). Can be a single URI or a list.
  • format – the format to use for the exported data; one of ‘csv’, ‘json’, or ‘avro’ (default ‘csv’).
  • csv_delimiter – for CSV exports, the field delimiter to use. Defaults to ‘,’
  • csv_header – for CSV exports, whether to include an initial header line. Default true.
  • compress – whether to compress the data on export. Compression is not supported for AVRO format. Defaults to False.
Returns:

A Job object for the export Job if it was started successfully; else None.

insert(data, include_index=False, index_name=None)[source]

Insert the contents of a Pandas DataFrame or a list of dictionaries into the table.

The insertion will be performed using at most 500 rows per POST, and at most 10 POSTs per second, as BigQuery has some limits on streaming rates.

Parameters:
  • data – the DataFrame or list to insert.
  • include_index – whether to include the DataFrame or list index as a column in the BQ table.
  • index_name – for a list, if include_index is True, this should be the name for the index. If not specified, ‘Index’ will be used.
Returns:

The table.

Raises:
  • Exception if the table doesn’t exist, the table’s schema differs from the data’s schema,
  • or the insert failed.
is_temporary

Whether this is a short-lived table or not. Always False for non-QueryResultsTables.

job

For tables resulting from executing queries, the job that created the table.

Default is None for a Table object; this is overridden by QueryResultsTable.

length

Get the length of the table (number of rows). We don’t use __len__ as this may return -1 for ‘unknown’.

load(source, mode=u'create', source_format=u'csv', csv_options=None, ignore_unknown_values=False, max_bad_records=0)[source]

Load the table from GCS.

Parameters:
  • source – the URL of the source objects(s). Can include a wildcard ‘*’ at the end of the item name. Can be a single source or a list.
  • mode – one of ‘create’, ‘append’, or ‘overwrite’. ‘append’ or ‘overwrite’ will fail if the table does not already exist, while ‘create’ will fail if it does. The default is ‘create’. If ‘create’ the schema will be inferred if necessary.
  • source_format – the format of the data, ‘csv’ or ‘json’; default ‘csv’.
  • csv_options – if source format is ‘csv’, additional options as a CSVOptions object.
  • ignore_unknown_values – if True, accept rows that contain values that do not match the schema; the unknown values are ignored (default False).
  • max_bad_records – the maximum number of bad records that are allowed (and ignored) before returning an ‘invalid’ error in the Job result (default 0).
Returns:

A Job object for the completed load Job if it was started successfully; else None.

load_async(source, mode=u'create', source_format=u'csv', csv_options=None, ignore_unknown_values=False, max_bad_records=0)[source]

Starts importing a table from GCS and return a Future.

Parameters:
  • source – the URL of the source objects(s). Can include a wildcard ‘*’ at the end of the item name. Can be a single source or a list.
  • mode – one of ‘create’, ‘append’, or ‘overwrite’. ‘append’ or ‘overwrite’ will fail if the table does not already exist, while ‘create’ will fail if it does. The default is ‘create’. If ‘create’ the schema will be inferred if necessary.
  • source_format – the format of the data, ‘csv’ or ‘json’; default ‘csv’.
  • csv_options – if source format is ‘csv’, additional options as a CSVOptions object.
  • ignore_unknown_values – If True, accept rows that contain values that do not match the schema; the unknown values are ignored (default False).
  • max_bad_records – the maximum number of bad records that are allowed (and ignored) before returning an ‘invalid’ error in the Job result (default 0).
Returns:

A Job object for the import if it was started successfully or None if not.

Raises:

Exception if the load job failed to be started or invalid arguments were supplied.

metadata

Retrieves metadata about the table.

Returns:A TableMetadata object.
Raises
Exception if the request could not be executed or the response was malformed.
name

The TableName named tuple (project_id, dataset_id, table_id, decorator) for the table.

range(start_row=0, max_rows=None)[source]

Get an iterator to iterate through a set of table rows.

Parameters:
  • start_row – the row of the table at which to start the iteration (default 0)
  • max_rows – an upper limit on the number of rows to iterate through (default None)
Returns:

A row iterator.

schema

Retrieves the schema of the table.

Returns:A Schema object containing a list of schema fields and associated metadata.
Raises
Exception if the request could not be executed or the response was malformed.
snapshot(at)[source]

Return a new Table which is a snapshot of this table at the specified time.

Parameters:at

the time of the snapshot. This can be a Python datetime (absolute) or timedelta (relative to current time). The result must be after the table was created and no more than seven days in the past. Passing None will get a reference the oldest snapshot.

Note that using a datetime will get a snapshot at an absolute point in time, while a timedelta will provide a varying snapshot; any queries issued against such a Table will be done against a snapshot that has an age relative to the execution time of the query.

Returns:A new Table object referencing the snapshot.
Raises:An exception if this Table is already decorated, or if the time specified is invalid.
to_dataframe(start_row=0, max_rows=None)[source]

Exports the table to a Pandas dataframe.

Parameters:
  • start_row – the row of the table at which to start the export (default 0)
  • max_rows – an upper limit on the number of rows to export (default None)
Returns:

A Pandas dataframe containing the table data.

to_file(destination, format=u'csv', csv_delimiter=u', ', csv_header=True)[source]

Save the results to a local file in CSV format.

Parameters:
  • destination – path on the local filesystem for the saved results.
  • format – the format to use for the exported data; currently only ‘csv’ is supported.
  • csv_delimiter – for CSV exports, the field delimiter to use. Defaults to ‘,’
  • csv_header – for CSV exports, whether to include an initial header line. Default true.
Raises:

An Exception if the operation failed.

update(friendly_name=None, description=None, expiry=None, schema=None)[source]

Selectively updates Table information.

Any parameters that are omitted or None are not updated.

Parameters:
  • friendly_name – if not None, the new friendly name.
  • description – if not None, the new description.
  • expiry – if not None, the new expiry time, either as a DateTime or milliseconds since epoch.
  • schema – if not None, the new schema: either a list of dictionaries or a Schema.
window(begin, end=None)[source]

Return a new Table limited to the rows added to this Table during the specified time range.

Parameters:
  • begin

    the start time of the window. This can be a Python datetime (absolute) or timedelta (relative to current time). The result must be after the table was created and no more than seven days in the past.

    Note that using a relative value will provide a varying snapshot, not a fixed snapshot; any queries issued against such a Table will be done against a snapshot that has an age relative to the execution time of the query.

  • end – the end time of the snapshot; if None, then the current time is used. The types and interpretation of values is as for start.
Returns:

A new Table object referencing the window.

Raises:

An exception if this Table is already decorated, or if the time specified is invalid.

class google.datalab.bigquery.TableMetadata(table, info)[source]

Represents metadata about a BigQuery table.

Initializes a TableMetadata instance.

Parameters:
  • table – the Table object this belongs to.
  • info – The BigQuery information about this table as a Python dictionary.
created_on

The creation timestamp.

description

The description of the table if it exists.

expires_on

The timestamp for when the table will expire, or None if unknown.

friendly_name

The friendly name of the table if it exists.

modified_on

The timestamp for when the table was last modified.

refresh()[source]

Refresh the metadata.

rows

The number of rows within the table, or -1 if unknown.

size

The size of the table in bytes, or -1 if unknown.

class google.datalab.bigquery.TableName(project_id, dataset_id, table_id, decorator)

A namedtuple for Table names.

Parameters:
  • project_id – the project id for the table.
  • dataset_id – the dataset id for the table.
  • table_id – the table id for the table.
  • decorator – the optional decorator for the table (for windowing/snapshot-ing).
dataset_id

Alias for field number 1

decorator

Alias for field number 3

project_id

Alias for field number 0

table_id

Alias for field number 2

class google.datalab.bigquery.UDF(name, code, return_type, params=u'', language=u'js', imports=u'')[source]

Represents a BigQuery UDF declaration.

Initializes a UDF object from its pieces.

Parameters:
  • name – the name of the javascript function
  • code – function body implementing the logic.
  • return_type – BigQuery data type of the function return. See supported data types in the BigQuery docs
  • params – dictionary of parameter names and types
  • language – see list of supported languages in the BigQuery docs
  • imports – a list of GCS paths containing further support code.
class google.datalab.bigquery.View(name, context=None)[source]

An implementation of a BigQuery View.

Initializes an instance of a View object.

Parameters:
  • name – the name of the view either as a string or a 3-part tuple (projectid, datasetid, name). If a string, it must have the form ‘<project>:<dataset>.<view>’ or ‘<dataset>.<view>’.
  • context – an optional Context object providing project_id and credentials. If a specific project id or credentials are unspecified, the default ones configured at the global level are used.
Raises:

Exception if the name is invalid.

create(query)[source]

Creates the view with the specified query.

Parameters:query – the query to use to for the View; either a string containing a SQL query or a Query object.
Returns:The View instance.
Raises:Exception if the view couldn’t be created or already exists and overwrite was False.
delete()[source]

Removes the view if it exists.

description

The description of the view if it exists.

exists()[source]

Whether the view’s Query has been executed and the view is available or not.

friendly_name

The friendly name of the view if it exists.

name

The name for the view as a named tuple.

query

The Query that defines the view.

schema

Retrieves the schema of the table.

Returns:A Schema object containing a list of schema fields and associated metadata.
Raises
Exception if the request could not be executed or the response was malformed.
update(friendly_name=None, description=None, query=None)[source]

Selectively updates View information.

Any parameters that are None (the default) are not applied in the update.

Parameters:
  • friendly_name – if not None, the new friendly name.
  • description – if not None, the new description.
  • query – if not None, a new query string for the View.