Google Cloud Functions local development

Preface

Testing and developing Google Cloud Functions via the Cloud Console is tedious, as the function has to be redeployed after every change to the code.

Google’s documentation

https://cloud.google.com/functions/docs/running/overview

for local development of Google Cloud Functions is somewhat simplistic and difficult to follow in places. Therefore here are my instructions for C#

General

Download dotnet-sdk-3.1

dotnet new --install Google.Cloud.Functions.Templates

dotnet new gcf-http

The project gets the name of the parent folder

Local with Functions Framework

Just

dotnet run

Local with Docker / Cloud Native Buildpacks

Install ‚pack‘:
https://buildpacks.io/docs/tools/pack/

Create Docker image:

pack build --builder gcr.io/buildpacks/builder:v1 --env GOOGLE_RUNTIME=dotnet --env GOOGLE_FUNCTION_SIGNATURE_TYPE=http --env GOOGLE_FUNCTION_TARGET=TestGcpLocalCloudFunction01.Function test-gpc-local-function-01

TestGcpLocalCloudFunction01 is the name of the project / namespace in C#. Function is the name of the class.

It is essential to note that the name of the image test-gpc-local-function-01 is lowercase.

Using AutoMapper with DevExpress XPO

If you are using AutoMapper (a great tool btw) together with DevExpress XPO its posssible you run into this error

automapper needs to have a constructor with 0 args or only optional args

var autoMapperConfig = new MapperConfiguration(
  cfg => cfg.CreateMap<SourceObject, MyXpoObject>().
    ConstructUsing(x => new MyXpoObject(mySession)
  )
);

Sure, no rocket science but I had to search a while.

Google BigQuery: Export all tables to Cloud Storage

Small shell script to export all tables of a BigQuery dataset to cloud storage.

#!/bin/bash

BGPROJECT='myproject01'
BGDATASET='mydataset01'
GSBUCKET='mybucked01'
GSFOLDER='myfolder01'

# remove VIEW and EXTERNAL

BGTABLES=$(bq ls ${BGDATASET} | grep 'TABLE' | awk '{print $1}')


for i in ${BGTABLES}
do
  echo ${i}
  bq extract --destination_format AVRO --compression SNAPPY ${BGDATASET}.${i} gs://${GSBUCKET}/${GSFOLDER}/${i}*.avro
done

Sync development folders between two computers

Scenario: I have two computers and I would like to seamless use both for development. I write software with C# (Visual Studio), PHP / Symfony and a little bit NodeJS and Bash.

My requirements:

  • Do not use GIT for sync. I do not want to push and commit only to pull on the other machine.
  • Do not use network drive or external disk. I would like to use cloud storage to prevent to to think about backup
  • Running on Windows 10. I have to program Windows desktop software with Visual Studio.
  • Windows Subsystem for Linux (WSL) must have access to the folder. For GIT and other CLI tools.

Good option: Dropbox (https://www.dropbox.com/). Very good sync, fast and it works. But:

I have a Google GSuite (https://gsuite.google.com/) subscription with unlimited space. I pay for it and I do not want to pay for another service. So my choice is: Google Drive (https://www.google.com/drive/download/)

I tested Google Drive File Stream (https://support.google.com/a/answer/7491144), the recommanded software from Google. Problem: No access with WSL to the local folder.

I tested Google Backup and Sync (https://support.google.com/drive/answer/2374987?), the „old“ software by Google. Problem: No partial sync of folders. Even if you config partial sync it seems to check all files on Google Drive. If you have a lots of files it take days just to get not finished.

My solution: Insync (https://www.insynchq.com/)
Installed on both computers and establish so called „Selectic Sync. Works for me.

If you you are not bonded to Google: Dropbox is much easier.

Microsoft Surface Book 2 vs. Surface Book 3 (15 inch)

CPU

Core™ i7-8650U 1,9 GHz Turbo-Boost 4,2 GHz, 8 MB
VS
Core™ i7-1065G7 1,3 GHz Turbo-Boost 3,9 GHz, 8 MB
+8%

Source:
https://cpu.userbenchmark.com/Compare/Intel-Core-i7-8650U-vs-Intel-Core-i7-1065G7/m353957vsm888368

GPU

NVIDIA GeForce GTX 1060 Mobile 6GB DDR5
VS
NVIDIA GeForce GTX1660Ti Max-Q 6GB DDR6
+19%

Source:
https://gpu.userbenchmark.com/Compare/Nvidia-GTX-1060-Mobile-vs-Nvidia-GTX-1660-Ti-Mobile-Max-Q/m164336vsm789578

RAM

16 GB LPDDR3-1866
VS
32 GB LPDDR4X-3733
+64% (Estimation)

Source:
https://ram.userbenchmark.com/Compare/HyperX-Fury-DDR3-1866-C10-2x8GB-vs-GSKILL-Trident-Z-DDR4-3600-C16-2x8GB/m42962vs3562

Problems solved while updating from Symfony 4.4 to 5.0

Step 1

Following update information at Symfony / Sensio Labs website:

https://symfony.com/doc/current/setup/upgrade_major.html

and changed composer.json file.

But DO NOT change

"symfony/web-server-bundle": "4.4.*",

Step 2

I could not update due to error message:

The requested package nelmio/cors-bundle 
(locked at 1.5.6, required as ^2.0) is satisfiable by 
nelmio/cors-bundle[1.5.6] but these conflict with your 
requirements or minimum-stability.

I had to change the file composer.lock from

"name": "nelmio/cors-bundle",
"version": "1.5.6",

to

"name": "nelmio/cors-bundle",
"version": "2.0.1",

Step 3

Last error message was

Uncaught Error: Class 'Symfony\Component\Debug\Debug' not found

and could be fixed by information from

https://github.com/symfony/recipes/blob/master/symfony/framework-bundle/4.4/public/index.php

changing public/index.php

from

use Symfony\Component\Debug\Debug

to

use Symfony\Component\ErrorHandler\Debug;

Restore old version of a BigQuery scheduled query

If your (accidentally) changed a Google BigQuery Scheduled Query and would like to go back to the old working version you can do following.

First: Find last correct running job:
Open scheduled queries in your browser. Click on the name to get details and select a date at the next page. On the right site look for „Job“ and copy job id. Should looks like

12345678:scheduled_query_12qwas-12qwas-12qwas

Second: Open a console.
If not installed yet get BigQuery command line tools:
https://cloud.google.com/bigquery/docs/bq-command-line-tool
or you Google Cloud web console

If installed and configured you can call now

bq --format=prettyjson show -j <your_job_id>

Maybe pipe it into a file, find the query, copy query, removed „\n“ in a text editor and you have it.

Recommendation engine with Google BigQuery ML Machine Learning

Preface: It was not successful. 

I would like to implement a small recommendation engine using Google BigQuery ML. We have anonymized order data in Google BigQuery and the idea is:

Find all combinations of product 1 and product 2 of all orders with two or more products purchased.  Let BigQuery ML learn. If we later have a given product 1 let BigQuery suggest some related products 2 to our customers.

Build a BigQuery select statement to find orders with 2 order lines / 2 products purchased.

WITH
  TableFirstProduct AS (
  SELECT
    *
  FROM (
    SELECT
      ol1.IncrementId AS IncrementId1,
      ol1.Sku AS Sku1,
      pc1.ParentSku AS ParentSku1,
      pc1.ParentId AS ParentEntityId1,
      pc1.Price,
      pc1.MyCustomProperty01,
      pc1.MyCustomProperty02,
      pc1.MyCustomProperty03,
      pc1.MyCustomProperty04,
      pc1.Color,
      ROW_NUMBER() OVER (PARTITION BY ol1.IncrementId) AS RowNumber1
    FROM
      `myproject01.mydataset01.order_lines` ol1
    JOIN
      `myproject01.mydataset01.product_childs` pc1
    ON
      ol1.ProductId = pc1.EntityId )
  WHERE
    RowNumber1 = 1),
  TableSecondProduct AS (
  SELECT
    *
  FROM (
    SELECT
      ol2.IncrementId AS IncrementId2,
      ol2.Sku AS Sku2,
      pc2.ParentSku AS ParentSku2,
      pc2.ParentId AS ParentEntityId2,
      ROW_NUMBER() OVER (PARTITION BY ol2.IncrementId) AS RowNumber2
    FROM
      `myproject01.mydataset01.order_lines` ol2
    JOIN
      `myproject01.mydataset01.product_childs` pc2
    ON
      ol2.ProductId = pc2.EntityId )
  WHERE
    RowNumber2 = 2 )
SELECT
  t1.*,
  t2.*
FROM
  TableFirstProduct t1
JOIN
  TableSecondProduct t2
ON
  t1.IncrementId1 = t2.IncrementId2
ORDER BY
  t1.IncrementId1

Saved as view „cart_2_products“.

Create model following this documentation

https://cloud.google.com/bigquery/docs/bigqueryml-analyst-start

#standardSQL
CREATE OR REPLACE MODEL `myproject01.recommandations.recom_product2`
OPTIONS(model_type='logistic_reg') AS
SELECT
  ParentSku2 as label,
  ParentSku1,
  Price,
  MyCustomProperty01,
  MyCustomProperty02,
  MyCustomProperty03,
  MyCustomProperty04,
  Color
FROM
  `myproject01.recommandations.cart_2_products`

Error:

Logistic regression currently only supports binary classification and the label column had 43305 unique labels instead of 2.

Ok, changed to

(model_type='linear_reg')

Error:

'label' column should be numerical.

Fixed using EntityId instead of SKU, I am not sure if this is the correct way to do it.

#standardSQL
CREATE OR REPLACE MODEL `myproject01.recommandations.recom_product2`
OPTIONS
  (model_type='linear_reg') AS
SELECT
  ParentEntityId2 AS label,
  ParentSku1,
  Price,
  MyCustomProperty01,
  MyCustomProperty02,
  MyCustomProperty03,
  MyCustomProperty04,
  Color
FROM
  `myproject01.recommandations.cart_2_products`

Error:

Label column 'label' has NULL values.

Fixing:

IF(ParentEntityId2 IS NULL,
    0,
    ParentEntityId2) AS label,

or full query

#standardSQL
CREATE OR REPLACE MODEL `myproject01.recommandations.recom_product2`
OPTIONS
  (model_type='linear_reg') AS
SELECT
  IF(ParentEntityId2 IS NULL,
    0,
    ParentEntityId2) AS label,
  ParentSku1,
  Price,
  MyCustomProperty01,
  MyCustomProperty02,
  MyCustomProperty03,
  MyCustomProperty04,
  Color
FROM
  `myproject01.recommandations.cart_2_products`

running over 6 mins…

Testing the model via ML.EVALUATE

SELECT
  *
FROM
  ML.EVALUATE(MODEL `myproject01.recommandations.recom_product2`,
    (
    SELECT
      IF(ParentEntityId2 IS NULL,
        0,
        ParentEntityId2) AS label,
      ParentSku1,
      Price,
      MyCustomProperty01,
      MyCustomProperty02,
      MyCustomProperty03,
      MyCustomProperty04,
      Color
    FROM
      `myproject01.recommandations.cart_2_products`
    LIMIT
      10 ))

results to

Row mean_absolute_error mean_squared_error mean_squared_log_error median_absolute_error r2_score explained_variance
1 23886.669450492096 1.0217489316058102E9 0.5414029727864474 18347.787328148363 0.1333862788413296 0.17761877811782734

To be honest:  I have no clue 🙂

Use the model to predict via ML.PREDICT

SELECT
  *
FROM
  ML.PREDICT(MODEL `myproject01.recommandations.recom_product2`,
    (
    SELECT
      IF(ParentEntityId2 IS NULL,
        0,
        ParentEntityId2) AS label,
      ParentSku1,
      Price,
      MyCustomProperty01,
      MyCustomProperty02,
      MyCustomProperty03,
      MyCustomProperty04,
      Color
    FROM
      `myproject01.recommandations.cart_2_products`
    LIMIT
      10 ))

 

Row predicted_label label ParentSku1 Price MyCustomProperty01 MyCustomProperty02 MyCustomProperty03 MyCustomProperty04 Color
1 75350.00227889014 36182 SKU01 64.99 Prop0101 Prop0201 null null Schwarz
2 35302.296034777406 24184 SKU02 9.99 Prop0102 Prop0201 null null Schwarz
3 54069.06814550898 36395 SKU03 49.99 Prop0103 Prop0201 null null Grau
4 80172.38105557486 37277 SKU04 26.99 Prop0104 Prop0201 null null Schwarz
5 50016.606966097504 25384 SKU05 9.99 Prop0105 Prop0201 null null Schwarz
6 47691.50187015592 35497 SKU06 18.99 Prop0106 Prop0201 null null Schwarz
7 34641.00410877989 32896 SKU07 69.99 Prop0107 Prop0201 null null Grau
8 35884.611568531276 34268 SKU08 59.99 Prop0108 Prop0201 null null Blau
9 46193.88571743077 38027 SKU09 24.99 Prop0109 Prop0201 null null Blau
10 45592.61311424247 35621 SKU10 64.99 Prop0110 Prop0210 null null Schwarz

And I see: I does not make sense.  It is the wrong way!

I need exact EntityIds and not probabilities and linear regression model is not aimed for this case. 

 

 

 

 

 

Working a lot with Googe BigQuery try superQuery Chrome extension

If you are using Google BigQuery web interface I suggest to try superQuery Chrome extension:
https://chrome.google.com/webstore/detail/superquery-bigquery-optim/lfckfngaeoheoppemkocjjebloiamfdc
by superquery.io @EVALUEX1

You will have features like

  • Multi-Tab
  • Legacy / Standard SQL Auto-Detection
  • Intelligent code completion that is context and schema-aware.
  • Infinite results scrolling