Shared Flashcard Set

Details

Google Cloud Certification 2019
Training for Google Cloud Certification
95
Computer Science
Professional
05/05/2019

Additional Computer Science Flashcards

 


 

Cards

Term
HIVE
Definition
- SQL Like wrapper for querying in Hadoop
- Very useful for BI/OLAP
Term
HBASE
Definition
Database for Fast sequential scans of static data.
Term
Pig
Definition
Data manipulation language for scripting to transform unstructured data to structured
Term
Spark
Definition
Distributed computing with Hadoop. Uses RDBs.
Term
Oozie
Definition
Hadoop Orchestration. Not really very popular. Dataproc does somewhat the same
Term
Kafka
Definition
Stream processing of unbounded data sets
Term
Google replacement for Hive
Definition
BigQuery
Term
What are the GCP Compute options?
Definition
- AppEngine: PaaS, serverless, ops-free
- Container Engine - cluseters of machines running Kubernetes
- Compute Engine, IaaS, fully controllable down to OS
Term
What option should I choose for a simple static web site
Definition
Cloud Storage
Term
What option should I choose for a web site needing SSL, HTTPS, CDN
Definition
Firebase Hosting with Cloud Storage
Term
What option should I choose for a web site requiring load balancing and autoscaling with fine grain control
Definition
GCP - Google Compute Engine
Term
What kinds of services are available when choosing GCE for web hosting?
Definition

- Cloud Launcher for web app deployment

- choose machine sizes and disk sizes

- storage options: cloud buckets, persistent disk, local SSD

- storage technologies: Cloud SQL (mySQL, PostgreSQL; NoSQL)

- Load Balancing at any stack level by GCP or 3rd party products

- DevOps tools

Term
How long do you have to react to a preemptive GCE shutdown notice
Definition
30 seconds
Term
what is the longest period a preemptive GCE can be used?
Definition
24 hours
Term
What are the storage options for a GCE
Definition
- small root persistent disk with OS is included
- Additional options:
- Persistent disk: standard; SSD
- Local SSD
- Cloud storage buckets
Term
What is available wtih GCE for logging and monitoring?
Definition
StackDriver
Term
How can you keep data available on a container restart?
Definition
Use GCE Persistent Disk
Term
What two environment choices are available for AppEngine
Definition
- Standard: Java7, Python 2.7, Go, PHP
- Flexible: Java8, Python 3.x, .NET, other choices
Term
What are the three levels of abstraction for choosing a platform for running your applicadtion?
Definition
- Compute Engine
- Container Engine
- App Engine
Term
How many VM instances does AppEngine use?
Definition
None.
Term

Which of the following is a PAAS option for hosting web apps?

  • Compute Engine VM
  • Container Engine instance
  • Cloud storage with Firebase hosting
  • App Engine standard or flexible environment
Definition
App Engine standard or flexible environment
Term

Which of the following is a IAAS option for hosting web apps on GCP?

  • App Engine standard environment
  • Container Engine instance
  • Compute Engine instance
  • Cloud storage with Firebase hosting
Definition
Compute Engine instance
Term

Rank the following storage options from most expensive to cheapest (per GB)

  • Cloud storage > SSD persistent disks > Local SSD > standard persistent
  • Local SSD > SSD persistent disks > standard persistent > Cloud storage
  • Local SSD > standard persistent disks > standard SSD > Cloud storage
  • SSD (any type) > Cloud Storage > standard persistent
Definition

Local SSD > SSD persistent disks > standard persistent > Cloud storage

Term

Rank the following options in scope of access

  • Cloud storage - global, persistent (SSD and standard ) - zonal, local SSD - instance
  • Cloud storage - regional, persistent (SSD and standard ) - regional, local SSD - instance
  • All storage options offer global access (but billing rates vary)
  • Cloud storage - global, persistent (SSD and standard ) - regional, local SSD - zonal
Definition

Cloud storage - global, persistent (SSD and standard ) - zonal, local SSD - instance

Term

How do storage options differ with container engine instances relative to compute engine instances?

  • Cloud storage is global access for container engine instances; regional for compute engine VMs
  • No difference in storage options
  • BigQuery and BigTable can be used from containers but not from raw compute engine
  • VMs Container disks are ephemeral by default; need to use a specific abstraction to make them persistent
Definition

Container disks are ephemeral by default; need to use a specific abstraction to make them persistent

Term
Use cases and Hadoop storage technologies
Definition
[image]
Term
Use cases and GCP storage technologies
Definition
[image]
Term
Name the storage options for Compute, Block Storage
Definition
  • Persistent Disks
    • Standard
    • SSDs
  • Local SSD
Term
Name Storage for media, blob storage
Definition
Cloud Storage
Term
Name the bucket storage classes for Cloud Storage
Definition
  • Multi-regional - for frequent access globally
  • Regional - for frequent access regionally
  • Nearline - access once a month max
  • Coldline - access once a year
Term
To store media and blob storage, what are the storage technologies in Hadoop and GCP?  What is the advantage of the GCP option
Definition
  • Hadoop - HDFS
  • GCP - Cloud storage

HDFS requires a name node.  GCP Cloud Storage does not.

Term
What are Hadoop and GCP storage technologies for SQL interface atop file data?  What is the advantage of the GCP option?
Definition
  • Hadoop - Hive
  • GCP - BigQuery

BigQuery is far faster than Hive.  It uses columner storage. Hive is on top fo HDFS.

Term
Can BigQuery be used for OLTP requiring ACID?
Definition
No.  ACID transactions not supported bu BigQuery
Term
What GCP storage technologies are available for OLTP transaction prorocessing?
Definition
  • Cloud SQL
  • Cloud Spanner
Term
What GCP technology is available for OLAP?
Definition
BigQuery
Term
What GCP relational databases technologies are available?
Definition
  • Cloud SQL
  • Cloud Spanner
Term
What GCP offering supports open source RDBMSs, and which are supported?  Which GCP offering is not open source, and what are the differences
Definition
  • Cloud SQL supports these
    • MySQL
    • Postgres
  • Cloud Spanner is proprietary
  • Cloud Spanner supports auto horizontal scaling.
Term
What GCP offering is available for document data storage?  What are it's properties
Definition

DataStore

  • Multi-key value store
  • Very fast hash based indexes
  • Very fast lookups of non-sequential keys
  • Same time for queries regardless of data size; query performence depends on result size
  • best for low write/high read-intensive needs
  • Offers transaction support
Term
What are the mobile specific GCP offerings?
Definition
  • Cloud Storage for Firebase: compute, block storage with mobile SDK access
  • Firebase Realtime DB: fast, random access with mobile SDK access.
Term
What is the command line tool for managing Google Cloud Storage?
Definition
gsutil
Term
What feature is provided that can be used to set up automatic deletion of a objects in cloud storage in a given time period.
Definition
Lifecycle management
Term
When is the GCP Transfer service preferred over gsutil for loading data?
Definition
  • When transferring from another cloud provider
  • When copying files from on-premise for the first time.
Term
What is the GCP equivalent of HDFS
Definition
DataProc.  Dataproc uses Cloud Storage instead of HDFS.
Term
Cloud Spanner Concepts
Definition
  • Hotspotting
  • Interleaving
  • Splits
  • Primary indexes required; Secondary Indices available (Not a feature in HBase)
  • Index directives (force a query to use an index)
  • STORING clause: force a column in an index
  • Non-normal data types: arrays; arrays of arrays; Structs in queries only.
  • Stronger than ACID: guarantees order of commitment
  • Transaction modes: locking read-write; read-only; single read call -- doesn't use locking.
  • Staleness timestamp bounds: Latest; bounded; exact
Term
What are advantages of BigTable over HBase?
Definition
  • Scalability
  • Low admin burden
  • Cluster resize without downtime
  • Many more column families before performance drops
Term
Are HIVE and BigTable NoSQL databases?  Can SQL be used to query these?
Definition
Yes.  No.
Term

Are these supported in BigTable

  • Multiple table operations?
  • Indexes
  • Constraints
  • Grouping
  • Joins
  • Aggregates
  • CRUD operations
Definition
No. for all exception basic CRUD operations are supported.
Term
What are the 4 dimensions of data in HIVE/Big Table?
Definition
  • Row ID
  • Column family
  • Column
  • Timestamp
Term
In what order is the Big Table Row Key stored?
Definition
Ascending
Term
What are two approaches to avoid hotspotting in BigTable
Definition
  1. Field Promotion: Reverse URL order
  2. Salting: Hash the key value
Term
How does BigTable improve performance overtime?
Definition

It observes read and write patterns and redistributes data among shards.

Term
What are things to look for when BigTable is not performing?
Definition
  • Poor schema design
  • workload too small (<300 gb)
  • used in short bursts
  • cluster too small
  • cluster just started
  • using HDD instead of SSD
Term
What are two fast operations in BigTable?
Definition
  • Lookup by row key
  • sequential scans
Term
What is the GCP document database product?
Definition
DataStore
Term

Which of these affects response time in datastore?

  • Database size
  • Resultset size
Definition
Result set size
Term
What is the index called that is used in a DataStore query, and how is it chosen?
Definition

It is called the Perfect index, and the index is chosen in this order:

  1. Choose equality indexes
  2. If no equality condition, choose an inequality index (only one inequality condition is allowed in a search)
  3. Choose an index and satisfies a sort condition.
Term
What are some restrictions and limitations of using DataStore?
Definition
  • Restrictions
    • No Joins
    • Only one inequality condition allowed
    • Cannot filter based on subquery results
  • Limitations
    • Updates are slow
    • limited ACID support
Term
What consistency options are avaialble for DataStore?
Definition
  • Strong consistency
  • Eventual consistency
Term
When can a schema be created in BigQuery?
Definition
  • At creation time
  • During initial load.
Term
What are two ways of loading data to Big Query?
Definition
  • Batch loads
  • Streaming loads
Term
What data formats are supported for data loading?
Definition
  • CSV
  • JSON
  • Avro
  • Cloud DataStore backups
Term
What feature is available above and beyond HIVE to enhance schema-on-read?
Definition
Schema Auto-Detection
Term
What are the four ways of querying BigQuery?  Give important details.
Definition
  • Interactive queries
  • Batch queries - which BigQuery will run when resources allow within 24 hours.
  • Views
    • Authorized view for security
    • Row level permissions available
    • Can't export data from a view
    • can't mix standard and legacy SQL
    • No functions or wildcards may be used
    • Limit of 1000 views
  • Partitioned tables
    • Automatically created based on load datetime
    • Automatic discarding
Term
How would you load data from Cloud Storage into BigQuery tables?
Definition
From the command line, use the bq load command.
Term
How can multiple tables be queried using wildcards in BigQuery using Standard SQL
Definition
Enclose the table name in backticks (`)
Term
What are called the inputs and outputs of a transformation in Apache Beam / DataFlow?
Definition
PCollection
Term
What are the components of Apache Beam?
Definition
  • Directed-acyclic graph: DAG
  • Pipeline: A single Data Flow job
  • PCollection: The data sets that are inputs and outputs form tranforms
  • Transform: transforms the data
  • Source, Sink - source and destinations
  • Driver:  Defines the computation DAG (pipeline)
  • Runner: executes the DAG on the backend
  • Backends suppoted:
    • Apache Spark
    • Apache Flink
    • Google Cloud Dataflow
    • Beam Model
Term
What are the elements in Pub / Sub?
Definition
  • Publisher
  • Subscriber
  • Messages
  • Queues: one per subscription
  • Acknowledgement
  • Planes
    • Data plane: moves messages between publishers and subscribers. Servers here are forwarders.
    • Control plane: handles assignment of publishes and subscribers to servers on the data plane. Servers here are routers.
Term
What are the two types of subscriptions in Pub Sub?  How do those subscriptions connect?
Definition
  • Push subscriptions use a WebHook endpoint
  • Pull subscriptions use an HTTPS request to an endpoint
Term
What interface is used to publish messages to pub sub?
Definition
Https request to googleapis.com
Term
In what order are messages in Pub Sub delivered to a subscriber
Definition
Random order -- no order guaranteed
Term

Define the pub sub:

  • Sliding window
  • Sliding interval
Definition
  • Sliding window:  the window of time from which all data is gathered for processing.
  • Sliding interval: The amount of time a window will shift for processing the next sliding window 
Term
What GCP product is used for Notebooks?
Definition
Data Lab
Term
What does the python kernel do for Data Lab notebooks
Definition
It manages the notebook session and variables
Term
What is a Representation ML-based system?  What is the names commonly used for such systems?
Definition

They figure out by themselves what features to pay attention to.  These are commonly called "Deep Learning" systems, which generally refer to Neural Networks.

Term
Describe a neural network model
Definition

A neural network model is made up of layers that feed one another.  Each layer is composed of neurons.  The outer layers -- the input and outer layers - are called the "visible layers".  Other layers are the "hidden" layers.  It is the depth of the layers that give Deep Learning its name.

Term
What describes a tensor?
Definition
  • Rank - The number of dimensions
    • 0 - Scalar
    • 1 - Vector
    • 2 - Matrix
    • >2 - n-dimensional
  • Shape - The number of elements in each dimension
  • Data Type
Term

What are, in Tensorflow

  • Constants
  • Placeholders
  • Variables
Definition
  • Constants: Immutable values which do no change
  • Placeholders: Assigned once and do not change after
  • Variables: are constantly recomputed
Term
What is a Tensorflow fee dictionary?
Definition
A way to specify the graphs input values.
Term

In Tensorflow, what are:

  • Coordinators
  • QueueRunners
Definition
  • Coordinators - a class to manage and work with multiple threads.
  • QueueRunner - allows you to work with multiple elements from a queue in parallel using multiple threads.
Term
What does tf.stack do in TensorFlow?
Definition
Converts multiple tensors to one tensor by adding a dimension, which becomes the index.
Term
Name some distance measures for measuring ML model accuracy
Definition
  • Euclidean: sqrt(sum((xj-xij)^2)), as the crow flies
  • L1/Manhattan/Snake/City block: absolute values of number of horizontal and vertical steps summed
Term
What is K-nearest neighbor
Definition

K-nearest neighbor is a machine learning algorithm for classification and regression.  It uses distance measures to find the closest matching class -- the most similar -- for a given input

Term
What is One-hot notation
Definition
One-hot notation is a way of encoding values by using a vector where all but one of the values is zero, and the value of the vector is the one that is not zero.
Term
What are the two functions applied in each neuron?
Definition
  • A linear (affine) transformation
    • Applies Wx + b to the intput
      • W: Weight
      • b: bias
        • These are variables finalized by the training process
  • An activation function
Term
What is ReLU
Definition
  • Rectified Linear Unit
  • A very commonly used activiation function in a neural network neuron.
  • Returns the max of the result from the Affine Transformation, or zero.
Term
In machine learning regression, what are all of the x variables together called?
Definition
the feature vector
Term
Describe the process by which linear regeression ML model converges to the final variable values
Definition

The process is assigned a gradient descent method which determines the learning rate by which the model travels through epochs until completion.

Term
What are three gradient descent optimizer choices, and how do they differ in batch size?
Definition
  • Stochastic: Uses only one training sample for each epoch
  • Mini-batch: Uses a subset of the whole training data set
  • Batch: Uses the whole training set for each epoch
Term
What three things must you decide and provide to the training model?
Definition
  • Batch size
  • Number of steps
  • optimizer function
Term
Describe logistic regression
Definition
  • Provides probability of a y value given an x value
  • produces an X curve
  • X variable can be continuous, but y variables can only be categorical, .e.g, binary
  • p(yi) = 1/(1+e^-(A+Bxi))
  • Often used for linear classification
Term
What does the TensorFlow Logit function do?
Definition
Transoforms a logistic regression S Curve to a linear regressions straight line.
Term

What Affinity function is used with

  • Linear regresssion
  • Logistic regression
Definition
  • Linear regresssion: identity (does nothing)
  • Logistic regression: Softmax (transforms Affine transformation into probabilities
Term
What is the activation function used in logistic regression?
Definition
Softmax function
Term
What cost function is used for logistic regression
Definition
Cross Entropy
Supporting users have an ad free experience!