Feature/docs by BeardedWhale · Pull Request #10 · Hydrospheredata/hydro-visualization

BeardedWhale · 2020-04-29T18:48:26Z

Added user and developer docs

Valenzione

Overall ok, doc coverage is great

Tow major points:

Move API connected documentation to OpenAPI spec, it'll make dev-doc.md clearer and more concise.
There is a lack of user perspective right now in user-doc.md. Overall we need to provide a clear message for each feature of hydro-vis, why we created it, and when\in which situation the user might want to use it. In the current state of doc, it's more about "What hydro-vis is" and not about "Why we did so"

keep up with good work!

Valenzione · 2020-05-06T10:18:28Z

+# Why to use and what it does
+
+Visualization of embedding space of your model can bring you various insights about your data and model performance. 
+
+Embeddings are low-dimensional, learned continuous vector representations of discrete variables
+
+embeddings can be used to:
+
+- find nearest points (points that your model considered close to each other)
+- detect domain drift
+- detect data where model makes mistakes
+- detect closes counterfactual - points that are close to each other but are classified by model as different
+
+Lets see what information our service provides:
+
+- Visualization of all production requests embeddings with various colorings:
+    - Colouring based on model prediction
+    - Colouring based on model confidence in predictions
+    - Colouring based on scores of your monitoring models
+- Closest requests to specific request
+- Closest counterfactuals to specific request
+- All information about request


This paragraph answers questions "What for embeddings can be used" and "what is hydro-vis" but still no direct, concrete answer to "Why hydro-vis"

About openApi, it is already here, in doc I specify it in very begining

Valenzione · 2020-05-06T10:18:39Z

+embeddings can be used to:
+
+- find nearest points (points that your model considered close to each other)
+- detect domain drift


Valenzione · 2020-05-06T10:18:57Z

+
+- find nearest points (points that your model considered close to each other)
+- detect domain drift
+- detect data where model makes mistakes


How? (Can it really?)

Valenzione · 2020-05-06T10:19:33Z

+- find nearest points (points that your model considered close to each other)
+- detect domain drift
+- detect data where model makes mistakes
+- detect closes counterfactual - points that are close to each other but are classified by model as different


Counterfactuals are calculated, not detected. Still, no clear understanding of why we might want to look at counterfactuals

Valenzione · 2020-05-06T10:20:08Z

+Lets see what information our service provides:
+
+- Visualization of all production requests embeddings with various colorings:
+    - Colouring based on model prediction


We provide such coloring to solve which problem?

yes, based on returned class and confidence

Valenzione · 2020-05-06T10:21:04Z

+    - Colouring based on model confidence in predictions
+    - Colouring based on scores of your monitoring models


Same goes for these two. It's "What" but not "Why"

Valenzione · 2020-05-06T10:22:05Z

+
+## 1. Create Model and Application
+
+Create your model, which will receive some inputs and return outputs which contain field `embedding`. Embedding should be a 1 D vector.  Upload your model using command `hs upload`. 


Minor comment - I'd rather use shape notation in form of tuple rather than "1 D vector".

Valenzione · 2020-05-06T10:23:30Z

+```
+
+
+# API


We can omit this section in this doc, since it's described thoroughly in OpenAPI spec

Yes, it is described but OpenAPI does not have additional information, here I add some comments when to use specific requests

Valenzione · 2020-05-06T10:24:00Z

+
+   visualization_metrics - metrics that are used to evaluate how good will visualization reflect your real multidimensional data in 2D/3D plot. More on visualization metrics you can find [here](#visualization-metrics) 
+
+   possible visualization metrics:


It's better to put it into OpenAPI spec

https://github.com/provectus/hydro-visualization/blob/04cfc632096a72120ca419309a300f17621aa1ac/openapi.yaml#L154

Valenzione · 2020-05-06T10:24:18Z

+
+    Returns state of a task and result if ready
+
+    states: = ['PENDING', 'RECEIVED', 'STARTED', 'FAILURE', 'REVOKED',  'RETRY'] (Source: [Celery Docs](https://docs.celeryproject.org/en/latest/reference/celery.states.html#all-states))


same, put it into OpenAPI spec

# Conflicts: # transformation_tasks/tasks.py

# Conflicts: # README.md # openapi.yaml # transformation_tasks/tasks.py

BeardedWhale added 7 commits April 24, 2020 19:28

Added user docs and dev docs on api

c91b9c7

Developer docs enhacnement

f6d3a7c

Add doc about embeddings extraction

afe41db

Add doc transforming pipeline

9b59fe4

Added transformers section

aed2a25

Updated metrics section

97dc8a1

Add contents

2819121

BeardedWhale requested a review from Valenzione April 29, 2020 18:48

BeardedWhale self-assigned this Apr 29, 2020

Fix typos

a2f6d26

Valenzione suggested changes May 6, 2020

View reviewed changes

BeardedWhale added 7 commits May 15, 2020 14:05

Merge remote-tracking branch 'remotes/origin/master' into feature/docs

6f616ee

# Conflicts: # transformation_tasks/tasks.py

Merge remote-tracking branch 'remotes/origin/master' into feature/docs

10f19a8

Updated Openapi

da4f25e

Update user doc

009f9b3

Update user doc

056fe1d

Merge remote-tracking branch 'remotes/origin/master' into feature/docs

cd3d7d1

# Conflicts: # README.md # openapi.yaml # transformation_tasks/tasks.py

update docs

6075c30

akastav unassigned BeardedWhale Apr 22, 2021

		- Colouring based on model confidence in predictions
		- Colouring based on scores of your monitoring models


		## 1. Create Model and Application

		Create your model, which will receive some inputs and return outputs which contain field `embedding`. Embedding should be a 1 D vector. Upload your model using command `hs upload`.


		visualization_metrics - metrics that are used to evaluate how good will visualization reflect your real multidimensional data in 2D/3D plot. More on visualization metrics you can find [here](#visualization-metrics)

		possible visualization metrics:


		Returns state of a task and result if ready

		states: = ['PENDING', 'RECEIVED', 'STARTED', 'FAILURE', 'REVOKED', 'RETRY'] (Source: [Celery Docs](https://docs.celeryproject.org/en/latest/reference/celery.states.html#all-states))

		```


		# API

Uh oh!

Conversation

BeardedWhale commented Apr 29, 2020

Uh oh!

Valenzione left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants