[DISCUSS] FLIP-133: Rework PyFlink Documentation

classic Classic list List threaded Threaded
13 messages Options
Reply | Threaded
Open this post in threaded view
|

[DISCUSS] FLIP-133: Rework PyFlink Documentation

jincheng sun
Hi folks,

Since the release of Flink 1.11, users of PyFlink have continued to grow.
As far as I know there are many companies have used PyFlink for data
analysis, operation and maintenance monitoring business has been put into
production(Such as 聚美优品[1](Jumei),  浙江墨芷[2] (Mozhi) etc.).  According to
the feedback we received, current documentation is not very friendly to
PyFlink users. There are two shortcomings:

- Python related content is mixed in the Java/Scala documentation, which
makes it difficult for users who only focus on PyFlink to read.
- There is already a "Python Table API" section in the Table API document
to store PyFlink documents, but the number of articles is small and the
content is fragmented. It is difficult for beginners to learn from it.

In addition, FLIP-130 introduced the Python DataStream API. Many documents
will be added for those new APIs. In order to increase the readability and
maintainability of the PyFlink document, Wei Zhong and me have discussed
offline and would like to rework it via this FLIP.

We will rework the document around the following three objectives:

- Add a separate section for Python API under the "Application Development"
section.
- Restructure current Python documentation to a brand new structure to
ensure complete content and friendly to beginners.
- Improve the documents shared by Python/Java/Scala to make it more
friendly to Python users and without affecting Java/Scala users.

More detail can be found in the FLIP-133:
https://cwiki.apache.org/confluence/display/FLINK/FLIP-133%3A+Rework+PyFlink+Documentation

Best,
Jincheng

[1] https://mp.weixin.qq.com/s/zVsBIs1ZEFe4atYUYtZpRg
[2] https://mp.weixin.qq.com/s/R4p_a2TWGpESBWr3pLtM2g
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] FLIP-133: Rework PyFlink Documentation

Dian Fu
Hi Jincheng,

Thanks a lot for bringing up this discussion and the proposal. +1 to improve the Python API doc.

I have received many feedbacks from PyFlink beginners about the PyFlink doc, e.g. the materials are too few, the Python doc is mixed with the Java doc and it's not easy to find the docs he wants to know.

I think it would greatly improve the user experience if we can have one place which includes most knowledges PyFlink users should know.

Regards,
Dian

> 在 2020年7月31日,上午10:14,jincheng sun <[hidden email]> 写道:
>
> Hi folks,
>
> Since the release of Flink 1.11, users of PyFlink have continued to grow. As far as I know there are many companies have used PyFlink for data analysis, operation and maintenance monitoring business has been put into production(Such as 聚美优品[1](Jumei),  浙江墨芷[2] (Mozhi) etc.).  According to the feedback we received, current documentation is not very friendly to PyFlink users. There are two shortcomings:
>
> - Python related content is mixed in the Java/Scala documentation, which makes it difficult for users who only focus on PyFlink to read.
> - There is already a "Python Table API" section in the Table API document to store PyFlink documents, but the number of articles is small and the content is fragmented. It is difficult for beginners to learn from it.
>
> In addition, FLIP-130 introduced the Python DataStream API. Many documents will be added for those new APIs. In order to increase the readability and maintainability of the PyFlink document, Wei Zhong and me have discussed offline and would like to rework it via this FLIP.
>
> We will rework the document around the following three objectives:
>
> - Add a separate section for Python API under the "Application Development" section.
> - Restructure current Python documentation to a brand new structure to ensure complete content and friendly to beginners.    
> - Improve the documents shared by Python/Java/Scala to make it more friendly to Python users and without affecting Java/Scala users.
>
> More detail can be found in the FLIP-133: https://cwiki.apache.org/confluence/display/FLINK/FLIP-133%3A+Rework+PyFlink+Documentation <https://cwiki.apache.org/confluence/display/FLINK/FLIP-133%3A+Rework+PyFlink+Documentation>
>
> Best,
> Jincheng
>
> [1] https://mp.weixin.qq.com/s/zVsBIs1ZEFe4atYUYtZpRg <https://mp.weixin.qq.com/s/zVsBIs1ZEFe4atYUYtZpRg>
> [2] https://mp.weixin.qq.com/s/R4p_a2TWGpESBWr3pLtM2g <https://mp.weixin.qq.com/s/R4p_a2TWGpESBWr3pLtM2g>
>

Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] FLIP-133: Rework PyFlink Documentation

Xingbo Huang
Hi Jincheng,

Thanks a lot for bringing up this discussion and the proposal.

Big +1 for improving the structure of PyFlink doc.

It will be very friendly to give PyFlink users a unified entrance to learn
PyFlink documents.

Best,
Xingbo

Dian Fu <[hidden email]> 于2020年7月31日周五 上午11:00写道:

> Hi Jincheng,
>
> Thanks a lot for bringing up this discussion and the proposal. +1 to
> improve the Python API doc.
>
> I have received many feedbacks from PyFlink beginners about
> the PyFlink doc, e.g. the materials are too few, the Python doc is mixed
> with the Java doc and it's not easy to find the docs he wants to know.
>
> I think it would greatly improve the user experience if we can have one
> place which includes most knowledges PyFlink users should know.
>
> Regards,
> Dian
>
> 在 2020年7月31日,上午10:14,jincheng sun <[hidden email]> 写道:
>
> Hi folks,
>
> Since the release of Flink 1.11, users of PyFlink have continued to grow.
> As far as I know there are many companies have used PyFlink for data
> analysis, operation and maintenance monitoring business has been put into
> production(Such as 聚美优品[1](Jumei),  浙江墨芷[2] (Mozhi) etc.).  According to
> the feedback we received, current documentation is not very friendly to
> PyFlink users. There are two shortcomings:
>
> - Python related content is mixed in the Java/Scala documentation, which
> makes it difficult for users who only focus on PyFlink to read.
> - There is already a "Python Table API" section in the Table API document
> to store PyFlink documents, but the number of articles is small and the
> content is fragmented. It is difficult for beginners to learn from it.
>
> In addition, FLIP-130 introduced the Python DataStream API. Many documents
> will be added for those new APIs. In order to increase the readability and
> maintainability of the PyFlink document, Wei Zhong and me have discussed
> offline and would like to rework it via this FLIP.
>
> We will rework the document around the following three objectives:
>
> - Add a separate section for Python API under the "Application
> Development" section.
> - Restructure current Python documentation to a brand new structure to
> ensure complete content and friendly to beginners.
> - Improve the documents shared by Python/Java/Scala to make it more
> friendly to Python users and without affecting Java/Scala users.
>
> More detail can be found in the FLIP-133:
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-133%3A+Rework+PyFlink+Documentation
>
> Best,
> Jincheng
>
> [1] https://mp.weixin.qq.com/s/zVsBIs1ZEFe4atYUYtZpRg
> [2] https://mp.weixin.qq.com/s/R4p_a2TWGpESBWr3pLtM2g
>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] FLIP-133: Rework PyFlink Documentation

Hequn Cheng-2
Hi Jincheng,

Thanks a lot for raising the discussion. +1 for the FLIP.

I think this will bring big benefits for the PyFlink users. Currently, the
Python TableAPI document is hidden deeply under the TableAPI&SQL tab which
makes it quite unreadable. Also, the PyFlink documentation is mixed with
Java/Scala documentation. It is hard for users to have an overview of all
the PyFlink documents. As more and more functionalities are added into
PyFlink, I think it's time for us to refactor the document.

Best,
Hequn


On Fri, Jul 31, 2020 at 3:43 PM Marta Paes Moreira <[hidden email]>
wrote:

> Hi, Jincheng!
>
> Thanks for creating this detailed FLIP, it will make a big difference in
> the experience of Python developers using Flink. I'm interested in
> contributing to this work, so I'll reach out to you offline!
>
> Also, thanks for sharing some information on the adoption of PyFlink, it's
> great to see that there are already production users.
>
> Marta
>
> On Fri, Jul 31, 2020 at 5:35 AM Xingbo Huang <[hidden email]> wrote:
>
> > Hi Jincheng,
> >
> > Thanks a lot for bringing up this discussion and the proposal.
> >
> > Big +1 for improving the structure of PyFlink doc.
> >
> > It will be very friendly to give PyFlink users a unified entrance to
> learn
> > PyFlink documents.
> >
> > Best,
> > Xingbo
> >
> > Dian Fu <[hidden email]> 于2020年7月31日周五 上午11:00写道:
> >
> >> Hi Jincheng,
> >>
> >> Thanks a lot for bringing up this discussion and the proposal. +1 to
> >> improve the Python API doc.
> >>
> >> I have received many feedbacks from PyFlink beginners about
> >> the PyFlink doc, e.g. the materials are too few, the Python doc is mixed
> >> with the Java doc and it's not easy to find the docs he wants to know.
> >>
> >> I think it would greatly improve the user experience if we can have one
> >> place which includes most knowledges PyFlink users should know.
> >>
> >> Regards,
> >> Dian
> >>
> >> 在 2020年7月31日,上午10:14,jincheng sun <[hidden email]> 写道:
> >>
> >> Hi folks,
> >>
> >> Since the release of Flink 1.11, users of PyFlink have continued to
> grow.
> >> As far as I know there are many companies have used PyFlink for data
> >> analysis, operation and maintenance monitoring business has been put
> into
> >> production(Such as 聚美优品[1](Jumei),  浙江墨芷[2] (Mozhi) etc.).  According to
> >> the feedback we received, current documentation is not very friendly to
> >> PyFlink users. There are two shortcomings:
> >>
> >> - Python related content is mixed in the Java/Scala documentation, which
> >> makes it difficult for users who only focus on PyFlink to read.
> >> - There is already a "Python Table API" section in the Table API
> document
> >> to store PyFlink documents, but the number of articles is small and the
> >> content is fragmented. It is difficult for beginners to learn from it.
> >>
> >> In addition, FLIP-130 introduced the Python DataStream API. Many
> >> documents will be added for those new APIs. In order to increase the
> >> readability and maintainability of the PyFlink document, Wei Zhong and
> me
> >> have discussed offline and would like to rework it via this FLIP.
> >>
> >> We will rework the document around the following three objectives:
> >>
> >> - Add a separate section for Python API under the "Application
> >> Development" section.
> >> - Restructure current Python documentation to a brand new structure to
> >> ensure complete content and friendly to beginners.
> >> - Improve the documents shared by Python/Java/Scala to make it more
> >> friendly to Python users and without affecting Java/Scala users.
> >>
> >> More detail can be found in the FLIP-133:
> >>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-133%3A+Rework+PyFlink+Documentation
> >>
> >> Best,
> >> Jincheng
> >>
> >> [1] https://mp.weixin.qq.com/s/zVsBIs1ZEFe4atYUYtZpRg
> >> [2] https://mp.weixin.qq.com/s/R4p_a2TWGpESBWr3pLtM2g
> >>
> >>
> >>
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] FLIP-133: Rework PyFlink Documentation

Shuiqiang Chen
Hi jincheng,

Thanks for the discussion. +1 for the FLIP.

A well-organized documentation will greatly improve the efficiency and
experience for developers.

Best,
Shuiqiang

Hequn Cheng <[hidden email]> 于2020年8月1日周六 上午8:42写道:

> Hi Jincheng,
>
> Thanks a lot for raising the discussion. +1 for the FLIP.
>
> I think this will bring big benefits for the PyFlink users. Currently, the
> Python TableAPI document is hidden deeply under the TableAPI&SQL tab which
> makes it quite unreadable. Also, the PyFlink documentation is mixed with
> Java/Scala documentation. It is hard for users to have an overview of all
> the PyFlink documents. As more and more functionalities are added into
> PyFlink, I think it's time for us to refactor the document.
>
> Best,
> Hequn
>
>
> On Fri, Jul 31, 2020 at 3:43 PM Marta Paes Moreira <[hidden email]>
> wrote:
>
>> Hi, Jincheng!
>>
>> Thanks for creating this detailed FLIP, it will make a big difference in
>> the experience of Python developers using Flink. I'm interested in
>> contributing to this work, so I'll reach out to you offline!
>>
>> Also, thanks for sharing some information on the adoption of PyFlink, it's
>> great to see that there are already production users.
>>
>> Marta
>>
>> On Fri, Jul 31, 2020 at 5:35 AM Xingbo Huang <[hidden email]> wrote:
>>
>> > Hi Jincheng,
>> >
>> > Thanks a lot for bringing up this discussion and the proposal.
>> >
>> > Big +1 for improving the structure of PyFlink doc.
>> >
>> > It will be very friendly to give PyFlink users a unified entrance to
>> learn
>> > PyFlink documents.
>> >
>> > Best,
>> > Xingbo
>> >
>> > Dian Fu <[hidden email]> 于2020年7月31日周五 上午11:00写道:
>> >
>> >> Hi Jincheng,
>> >>
>> >> Thanks a lot for bringing up this discussion and the proposal. +1 to
>> >> improve the Python API doc.
>> >>
>> >> I have received many feedbacks from PyFlink beginners about
>> >> the PyFlink doc, e.g. the materials are too few, the Python doc is
>> mixed
>> >> with the Java doc and it's not easy to find the docs he wants to know.
>> >>
>> >> I think it would greatly improve the user experience if we can have one
>> >> place which includes most knowledges PyFlink users should know.
>> >>
>> >> Regards,
>> >> Dian
>> >>
>> >> 在 2020年7月31日,上午10:14,jincheng sun <[hidden email]> 写道:
>> >>
>> >> Hi folks,
>> >>
>> >> Since the release of Flink 1.11, users of PyFlink have continued to
>> grow.
>> >> As far as I know there are many companies have used PyFlink for data
>> >> analysis, operation and maintenance monitoring business has been put
>> into
>> >> production(Such as 聚美优品[1](Jumei),  浙江墨芷[2] (Mozhi) etc.).  According
>> to
>> >> the feedback we received, current documentation is not very friendly to
>> >> PyFlink users. There are two shortcomings:
>> >>
>> >> - Python related content is mixed in the Java/Scala documentation,
>> which
>> >> makes it difficult for users who only focus on PyFlink to read.
>> >> - There is already a "Python Table API" section in the Table API
>> document
>> >> to store PyFlink documents, but the number of articles is small and the
>> >> content is fragmented. It is difficult for beginners to learn from it.
>> >>
>> >> In addition, FLIP-130 introduced the Python DataStream API. Many
>> >> documents will be added for those new APIs. In order to increase the
>> >> readability and maintainability of the PyFlink document, Wei Zhong and
>> me
>> >> have discussed offline and would like to rework it via this FLIP.
>> >>
>> >> We will rework the document around the following three objectives:
>> >>
>> >> - Add a separate section for Python API under the "Application
>> >> Development" section.
>> >> - Restructure current Python documentation to a brand new structure to
>> >> ensure complete content and friendly to beginners.
>> >> - Improve the documents shared by Python/Java/Scala to make it more
>> >> friendly to Python users and without affecting Java/Scala users.
>> >>
>> >> More detail can be found in the FLIP-133:
>> >>
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-133%3A+Rework+PyFlink+Documentation
>> >>
>> >> Best,
>> >> Jincheng
>> >>
>> >> [1] https://mp.weixin.qq.com/s/zVsBIs1ZEFe4atYUYtZpRg
>> >> [2] https://mp.weixin.qq.com/s/R4p_a2TWGpESBWr3pLtM2g
>> >>
>> >>
>> >>
>>
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] FLIP-133: Rework PyFlink Documentation

jincheng sun
Would be great if you could join the contribution of PyFlink
documentation @Marta !
Thanks for all of the positive feedback. I will start a formal vote then
later...

Best,
Jincheng


Shuiqiang Chen <[hidden email]> 于2020年8月3日周一 上午9:56写道:

> Hi jincheng,
>
> Thanks for the discussion. +1 for the FLIP.
>
> A well-organized documentation will greatly improve the efficiency and
> experience for developers.
>
> Best,
> Shuiqiang
>
> Hequn Cheng <[hidden email]> 于2020年8月1日周六 上午8:42写道:
>
>> Hi Jincheng,
>>
>> Thanks a lot for raising the discussion. +1 for the FLIP.
>>
>> I think this will bring big benefits for the PyFlink users. Currently,
>> the Python TableAPI document is hidden deeply under the TableAPI&SQL tab
>> which makes it quite unreadable. Also, the PyFlink documentation is mixed
>> with Java/Scala documentation. It is hard for users to have an overview of
>> all the PyFlink documents. As more and more functionalities are added into
>> PyFlink, I think it's time for us to refactor the document.
>>
>> Best,
>> Hequn
>>
>>
>> On Fri, Jul 31, 2020 at 3:43 PM Marta Paes Moreira <[hidden email]>
>> wrote:
>>
>>> Hi, Jincheng!
>>>
>>> Thanks for creating this detailed FLIP, it will make a big difference in
>>> the experience of Python developers using Flink. I'm interested in
>>> contributing to this work, so I'll reach out to you offline!
>>>
>>> Also, thanks for sharing some information on the adoption of PyFlink,
>>> it's
>>> great to see that there are already production users.
>>>
>>> Marta
>>>
>>> On Fri, Jul 31, 2020 at 5:35 AM Xingbo Huang <[hidden email]> wrote:
>>>
>>> > Hi Jincheng,
>>> >
>>> > Thanks a lot for bringing up this discussion and the proposal.
>>> >
>>> > Big +1 for improving the structure of PyFlink doc.
>>> >
>>> > It will be very friendly to give PyFlink users a unified entrance to
>>> learn
>>> > PyFlink documents.
>>> >
>>> > Best,
>>> > Xingbo
>>> >
>>> > Dian Fu <[hidden email]> 于2020年7月31日周五 上午11:00写道:
>>> >
>>> >> Hi Jincheng,
>>> >>
>>> >> Thanks a lot for bringing up this discussion and the proposal. +1 to
>>> >> improve the Python API doc.
>>> >>
>>> >> I have received many feedbacks from PyFlink beginners about
>>> >> the PyFlink doc, e.g. the materials are too few, the Python doc is
>>> mixed
>>> >> with the Java doc and it's not easy to find the docs he wants to know.
>>> >>
>>> >> I think it would greatly improve the user experience if we can have
>>> one
>>> >> place which includes most knowledges PyFlink users should know.
>>> >>
>>> >> Regards,
>>> >> Dian
>>> >>
>>> >> 在 2020年7月31日,上午10:14,jincheng sun <[hidden email]> 写道:
>>> >>
>>> >> Hi folks,
>>> >>
>>> >> Since the release of Flink 1.11, users of PyFlink have continued to
>>> grow.
>>> >> As far as I know there are many companies have used PyFlink for data
>>> >> analysis, operation and maintenance monitoring business has been put
>>> into
>>> >> production(Such as 聚美优品[1](Jumei),  浙江墨芷[2] (Mozhi) etc.).  According
>>> to
>>> >> the feedback we received, current documentation is not very friendly
>>> to
>>> >> PyFlink users. There are two shortcomings:
>>> >>
>>> >> - Python related content is mixed in the Java/Scala documentation,
>>> which
>>> >> makes it difficult for users who only focus on PyFlink to read.
>>> >> - There is already a "Python Table API" section in the Table API
>>> document
>>> >> to store PyFlink documents, but the number of articles is small and
>>> the
>>> >> content is fragmented. It is difficult for beginners to learn from it.
>>> >>
>>> >> In addition, FLIP-130 introduced the Python DataStream API. Many
>>> >> documents will be added for those new APIs. In order to increase the
>>> >> readability and maintainability of the PyFlink document, Wei Zhong
>>> and me
>>> >> have discussed offline and would like to rework it via this FLIP.
>>> >>
>>> >> We will rework the document around the following three objectives:
>>> >>
>>> >> - Add a separate section for Python API under the "Application
>>> >> Development" section.
>>> >> - Restructure current Python documentation to a brand new structure to
>>> >> ensure complete content and friendly to beginners.
>>> >> - Improve the documents shared by Python/Java/Scala to make it more
>>> >> friendly to Python users and without affecting Java/Scala users.
>>> >>
>>> >> More detail can be found in the FLIP-133:
>>> >>
>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-133%3A+Rework+PyFlink+Documentation
>>> >>
>>> >> Best,
>>> >> Jincheng
>>> >>
>>> >> [1] https://mp.weixin.qq.com/s/zVsBIs1ZEFe4atYUYtZpRg
>>> >> [2] https://mp.weixin.qq.com/s/R4p_a2TWGpESBWr3pLtM2g
>>> >>
>>> >>
>>> >>
>>>
>>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] FLIP-133: Rework PyFlink Documentation

Seth Wiesman
Hi Jincheng,

I'm very excited to see the enthusiasm for documentation work but I am
concerned about the communities long term ability to maintain this
contribution. In particular, I'm concerned that this proposal duplicates a
lot of content that will quickly get out of sync. So far the community does
not have a great track record for maintaining documentation after its
initial contribution.

In particular, I do not believe the following items need to be copied:

DataTypes
Built-in functions
Connectors
SQL
Catalogs
Configurations

Another issue is that this proposal feels like it is documenting PyFlink
separately from the rest of the project. Things like the cookbook and
tutorial should be under the Try Flink section of the documentation.

Seth


On Mon, Aug 3, 2020 at 1:08 AM jincheng sun <[hidden email]>
wrote:

> Would be great if you could join the contribution of PyFlink
> documentation @Marta !
> Thanks for all of the positive feedback. I will start a formal vote then
> later...
>
> Best,
> Jincheng
>
>
> Shuiqiang Chen <[hidden email]> 于2020年8月3日周一 上午9:56写道:
>
> > Hi jincheng,
> >
> > Thanks for the discussion. +1 for the FLIP.
> >
> > A well-organized documentation will greatly improve the efficiency and
> > experience for developers.
> >
> > Best,
> > Shuiqiang
> >
> > Hequn Cheng <[hidden email]> 于2020年8月1日周六 上午8:42写道:
> >
> >> Hi Jincheng,
> >>
> >> Thanks a lot for raising the discussion. +1 for the FLIP.
> >>
> >> I think this will bring big benefits for the PyFlink users. Currently,
> >> the Python TableAPI document is hidden deeply under the TableAPI&SQL tab
> >> which makes it quite unreadable. Also, the PyFlink documentation is
> mixed
> >> with Java/Scala documentation. It is hard for users to have an overview
> of
> >> all the PyFlink documents. As more and more functionalities are added
> into
> >> PyFlink, I think it's time for us to refactor the document.
> >>
> >> Best,
> >> Hequn
> >>
> >>
> >> On Fri, Jul 31, 2020 at 3:43 PM Marta Paes Moreira <[hidden email]
> >
> >> wrote:
> >>
> >>> Hi, Jincheng!
> >>>
> >>> Thanks for creating this detailed FLIP, it will make a big difference
> in
> >>> the experience of Python developers using Flink. I'm interested in
> >>> contributing to this work, so I'll reach out to you offline!
> >>>
> >>> Also, thanks for sharing some information on the adoption of PyFlink,
> >>> it's
> >>> great to see that there are already production users.
> >>>
> >>> Marta
> >>>
> >>> On Fri, Jul 31, 2020 at 5:35 AM Xingbo Huang <[hidden email]>
> wrote:
> >>>
> >>> > Hi Jincheng,
> >>> >
> >>> > Thanks a lot for bringing up this discussion and the proposal.
> >>> >
> >>> > Big +1 for improving the structure of PyFlink doc.
> >>> >
> >>> > It will be very friendly to give PyFlink users a unified entrance to
> >>> learn
> >>> > PyFlink documents.
> >>> >
> >>> > Best,
> >>> > Xingbo
> >>> >
> >>> > Dian Fu <[hidden email]> 于2020年7月31日周五 上午11:00写道:
> >>> >
> >>> >> Hi Jincheng,
> >>> >>
> >>> >> Thanks a lot for bringing up this discussion and the proposal. +1 to
> >>> >> improve the Python API doc.
> >>> >>
> >>> >> I have received many feedbacks from PyFlink beginners about
> >>> >> the PyFlink doc, e.g. the materials are too few, the Python doc is
> >>> mixed
> >>> >> with the Java doc and it's not easy to find the docs he wants to
> know.
> >>> >>
> >>> >> I think it would greatly improve the user experience if we can have
> >>> one
> >>> >> place which includes most knowledges PyFlink users should know.
> >>> >>
> >>> >> Regards,
> >>> >> Dian
> >>> >>
> >>> >> 在 2020年7月31日,上午10:14,jincheng sun <[hidden email]> 写道:
> >>> >>
> >>> >> Hi folks,
> >>> >>
> >>> >> Since the release of Flink 1.11, users of PyFlink have continued to
> >>> grow.
> >>> >> As far as I know there are many companies have used PyFlink for data
> >>> >> analysis, operation and maintenance monitoring business has been put
> >>> into
> >>> >> production(Such as 聚美优品[1](Jumei),  浙江墨芷[2] (Mozhi) etc.).
> According
> >>> to
> >>> >> the feedback we received, current documentation is not very friendly
> >>> to
> >>> >> PyFlink users. There are two shortcomings:
> >>> >>
> >>> >> - Python related content is mixed in the Java/Scala documentation,
> >>> which
> >>> >> makes it difficult for users who only focus on PyFlink to read.
> >>> >> - There is already a "Python Table API" section in the Table API
> >>> document
> >>> >> to store PyFlink documents, but the number of articles is small and
> >>> the
> >>> >> content is fragmented. It is difficult for beginners to learn from
> it.
> >>> >>
> >>> >> In addition, FLIP-130 introduced the Python DataStream API. Many
> >>> >> documents will be added for those new APIs. In order to increase the
> >>> >> readability and maintainability of the PyFlink document, Wei Zhong
> >>> and me
> >>> >> have discussed offline and would like to rework it via this FLIP.
> >>> >>
> >>> >> We will rework the document around the following three objectives:
> >>> >>
> >>> >> - Add a separate section for Python API under the "Application
> >>> >> Development" section.
> >>> >> - Restructure current Python documentation to a brand new structure
> to
> >>> >> ensure complete content and friendly to beginners.
> >>> >> - Improve the documents shared by Python/Java/Scala to make it more
> >>> >> friendly to Python users and without affecting Java/Scala users.
> >>> >>
> >>> >> More detail can be found in the FLIP-133:
> >>> >>
> >>>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-133%3A+Rework+PyFlink+Documentation
> >>> >>
> >>> >> Best,
> >>> >> Jincheng
> >>> >>
> >>> >> [1] https://mp.weixin.qq.com/s/zVsBIs1ZEFe4atYUYtZpRg
> >>> >> [2] https://mp.weixin.qq.com/s/R4p_a2TWGpESBWr3pLtM2g
> >>> >>
> >>> >>
> >>> >>
> >>>
> >>
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] FLIP-133: Rework PyFlink Documentation

jincheng sun
In reply to this post by jincheng sun
Hi Seth and David,

I'm very happy to have your reply and suggestions. I would like to share my
thoughts here:

The main motivation we want to refactor the PyFlink doc is that we want to
make sure that the Python users could find all they want starting from the
PyFlink documentation mainpage. That’s, the PyFlink documentation should
have a catalogue which includes all the functionalities available in
PyFlink. However, this doesn’t mean that we will make a copy of the content
of the documentation in the other places. It may be just a reference/link
to the other documentation if needed. For the documentation added under
PyFlink mainpage, the principle is that it should only include Python
specific content, instead of making a copy of the Java content.

>>  I'm concerned that this proposal duplicates a lot of content that will
quickly get out of sync. It feels like it is documenting PyFlink separately
from the rest of the project.

Regarding the concerns about maintainability, as mentioned above, The goal
of this FLIP is to provide an intelligible entrance of Python API, and the
content in it should only contain the information which is useful for
Python users. There are indeed many agenda items that duplicate the Java
documents in this FLIP, but it doesn't mean the content would be copied
from Java documentation. i.e, if the content of the document is the same as
the corresponding Java document, we will add a link to the Java document.
e.g. the "Built-in functions" and "SQL". We only create a page for the
Python-only content, and then redirect to the Java document if there is
something shared with Java. e.g. "Connectors" and "Catalogs". If the
document is Python-only and already exists, we will move it from the old
python document to the new python document, e.g. "Configurations". If the
document is Python-only and not exists before, we will create a new page
for it. e.g. "DataTypes".

The main reason we create a new page for Python Data Types is that it is
only conceptually one-to-one correspondence with Java Data Types, but the
actual document content would be very different from Java DataTypes. Some
detailed difference are as following:



  - The text in the Java Data Types document is written for JVM-based
language users, which is incomprehensible to users who only understand
python.

  - Currently the Python Data Types does not support the "bridgedTo"
method, DataTypes.RAW, DataTypes.NULL and User Defined Types.

  - The section "Planner Compatibility" and "Data Type Extraction" are only
useful for Java/Scala users.

  - We want to add sections which may only apply for Python such as which
Data Types are currently supported in Python, the mapping between DataType
and Python object type, etc.

I think the root cause of such a difference with existing documents is
that, Python is the first non-JVM language we support in flink. This means
our previous method of sharing documents between Java and Scala may not be
suitable for Python. So we will adopt some very different methods to
provide documentation for Python users. Of course, we should reduce
maintenance costs as much as possible while ensuring user experience.
Furthermore, python is the first step of flink multi-language support, and
there may be R, Go, etc in future. it is very necessary for us to form main
page for each language, so that users of each type of language can focus on
the content which they care about.

>> Things like the cookbook and tutorial should be under the Try Flink
section of the documentation.

Regarding the position of the "Cookbook" section, in my sense the "Try
Flink" is for the new users and the "Cookbook" is for more advanced users,
i.e., In “Try Flink” can be the simplest end-to-end example, such as “Hello
World” and In “Cookbook” we can add more use cases closer to production
business, Such as, CDN log analysis, PV / UV of e-commerce. So I prefer to
keep the current structure.

>>  it's relatively straightforward to compare the Python API with the Java
and Scala versions.

Regarding the comparison between Python API and Java/Scala API, I think the
majority of users, especially the beginner users, would not have this
demand. The priority of increasing user experience for beginner users seems
higher than it from my side. Would you please add more inputs for why user
want to compare? How much impact will the comparison be if we put it on
multiple pages :)

Thanks for all of your feedback and suggestions, any follow-up feedback is
welcome.

Best,

Jincheng


David Anderson <[hidden email]> 于2020年8月3日周一 下午10:49写道:

> Jincheng,
>
> One thing that I like about the way that the documentation is currently
> organized is that it's relatively straightforward to compare the Python API
> with the Java and Scala versions. I'm concerned that if the PyFlink docs
> are more independent, it will be challenging to respond to questions about
> which features from the other APIs are available from Python.
>
> David
>
> On Mon, Aug 3, 2020 at 8:07 AM jincheng sun <[hidden email]>
> wrote:
>
>> Would be great if you could join the contribution of PyFlink
>> documentation @Marta !
>> Thanks for all of the positive feedback. I will start a formal vote then
>> later...
>>
>> Best,
>> Jincheng
>>
>>
>> Shuiqiang Chen <[hidden email]> 于2020年8月3日周一 上午9:56写道:
>>
>> > Hi jincheng,
>> >
>> > Thanks for the discussion. +1 for the FLIP.
>> >
>> > A well-organized documentation will greatly improve the efficiency and
>> > experience for developers.
>> >
>> > Best,
>> > Shuiqiang
>> >
>> > Hequn Cheng <[hidden email]> 于2020年8月1日周六 上午8:42写道:
>> >
>> >> Hi Jincheng,
>> >>
>> >> Thanks a lot for raising the discussion. +1 for the FLIP.
>> >>
>> >> I think this will bring big benefits for the PyFlink users. Currently,
>> >> the Python TableAPI document is hidden deeply under the TableAPI&SQL
>> tab
>> >> which makes it quite unreadable. Also, the PyFlink documentation is
>> mixed
>> >> with Java/Scala documentation. It is hard for users to have an
>> overview of
>> >> all the PyFlink documents. As more and more functionalities are added
>> into
>> >> PyFlink, I think it's time for us to refactor the document.
>> >>
>> >> Best,
>> >> Hequn
>> >>
>> >>
>> >> On Fri, Jul 31, 2020 at 3:43 PM Marta Paes Moreira <
>> [hidden email]>
>> >> wrote:
>> >>
>> >>> Hi, Jincheng!
>> >>>
>> >>> Thanks for creating this detailed FLIP, it will make a big difference
>> in
>> >>> the experience of Python developers using Flink. I'm interested in
>> >>> contributing to this work, so I'll reach out to you offline!
>> >>>
>> >>> Also, thanks for sharing some information on the adoption of PyFlink,
>> >>> it's
>> >>> great to see that there are already production users.
>> >>>
>> >>> Marta
>> >>>
>> >>> On Fri, Jul 31, 2020 at 5:35 AM Xingbo Huang <[hidden email]>
>> wrote:
>> >>>
>> >>> > Hi Jincheng,
>> >>> >
>> >>> > Thanks a lot for bringing up this discussion and the proposal.
>> >>> >
>> >>> > Big +1 for improving the structure of PyFlink doc.
>> >>> >
>> >>> > It will be very friendly to give PyFlink users a unified entrance to
>> >>> learn
>> >>> > PyFlink documents.
>> >>> >
>> >>> > Best,
>> >>> > Xingbo
>> >>> >
>> >>> > Dian Fu <[hidden email]> 于2020年7月31日周五 上午11:00写道:
>> >>> >
>> >>> >> Hi Jincheng,
>> >>> >>
>> >>> >> Thanks a lot for bringing up this discussion and the proposal. +1
>> to
>> >>> >> improve the Python API doc.
>> >>> >>
>> >>> >> I have received many feedbacks from PyFlink beginners about
>> >>> >> the PyFlink doc, e.g. the materials are too few, the Python doc is
>> >>> mixed
>> >>> >> with the Java doc and it's not easy to find the docs he wants to
>> know.
>> >>> >>
>> >>> >> I think it would greatly improve the user experience if we can have
>> >>> one
>> >>> >> place which includes most knowledges PyFlink users should know.
>> >>> >>
>> >>> >> Regards,
>> >>> >> Dian
>> >>> >>
>> >>> >> 在 2020年7月31日,上午10:14,jincheng sun <[hidden email]> 写道:
>> >>> >>
>> >>> >> Hi folks,
>> >>> >>
>> >>> >> Since the release of Flink 1.11, users of PyFlink have continued to
>> >>> grow.
>> >>> >> As far as I know there are many companies have used PyFlink for
>> data
>> >>> >> analysis, operation and maintenance monitoring business has been
>> put
>> >>> into
>> >>> >> production(Such as 聚美优品[1](Jumei),  浙江墨芷[2] (Mozhi) etc.).
>> According
>> >>> to
>> >>> >> the feedback we received, current documentation is not very
>> friendly
>> >>> to
>> >>> >> PyFlink users. There are two shortcomings:
>> >>> >>
>> >>> >> - Python related content is mixed in the Java/Scala documentation,
>> >>> which
>> >>> >> makes it difficult for users who only focus on PyFlink to read.
>> >>> >> - There is already a "Python Table API" section in the Table API
>> >>> document
>> >>> >> to store PyFlink documents, but the number of articles is small and
>> >>> the
>> >>> >> content is fragmented. It is difficult for beginners to learn from
>> it.
>> >>> >>
>> >>> >> In addition, FLIP-130 introduced the Python DataStream API. Many
>> >>> >> documents will be added for those new APIs. In order to increase
>> the
>> >>> >> readability and maintainability of the PyFlink document, Wei Zhong
>> >>> and me
>> >>> >> have discussed offline and would like to rework it via this FLIP.
>> >>> >>
>> >>> >> We will rework the document around the following three objectives:
>> >>> >>
>> >>> >> - Add a separate section for Python API under the "Application
>> >>> >> Development" section.
>> >>> >> - Restructure current Python documentation to a brand new
>> structure to
>> >>> >> ensure complete content and friendly to beginners.
>> >>> >> - Improve the documents shared by Python/Java/Scala to make it more
>> >>> >> friendly to Python users and without affecting Java/Scala users.
>> >>> >>
>> >>> >> More detail can be found in the FLIP-133:
>> >>> >>
>> >>>
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-133%3A+Rework+PyFlink+Documentation
>> >>> >>
>> >>> >> Best,
>> >>> >> Jincheng
>> >>> >>
>> >>> >> [1] https://mp.weixin.qq.com/s/zVsBIs1ZEFe4atYUYtZpRg
>> >>> >> [2] https://mp.weixin.qq.com/s/R4p_a2TWGpESBWr3pLtM2g
>> >>> >>
>> >>> >>
>> >>> >>
>> >>>
>> >>
>>
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] FLIP-133: Rework PyFlink Documentation

Xingbo Huang
Hi,

I found that the spark community is also working on redesigning pyspark
documentation[1] recently. Maybe we can compare the difference between our
document structure and its document structure.

[1] https://issues.apache.org/jira/browse/SPARK-31851
http://apache-spark-developers-list.1001551.n3.nabble.com/Need-some-help-and-contributions-in-PySpark-API-documentation-td29972.html

Best,
Xingbo

David Anderson <[hidden email]> 于2020年8月5日周三 上午3:17写道:

> I'm delighted to see energy going into improving the documentation.
>
> With the current documentation, I get a lot of questions that I believe
> reflect two fundamental problems with what we currently provide:
>
> (1) We have a lot of contextual information in our heads about how Flink
> works, and we are able to use that knowledge to make reasonable inferences
> about how things (probably) work in cases we aren't so familiar with. For
> example, I get a lot of questions of the form "If I use <this feature> will
> I still have exactly once guarantees?" The answer is always yes, but they
> continue to have doubts because we have failed to clearly communicate this
> fundamental, underlying principle.
>
> This specific example about fault tolerance applies across all of the
> Flink docs, but the general idea can also be applied to the Table/SQL and
> PyFlink docs. The guiding principles underlying these APIs should be
> written down in one easy-to-find place.
>
> (2) The other kind of question I get a lot is "Can I do <X> with <Y>?"
> E.g., "Can I use the JDBC table sink from PyFlink?" These questions can be
> very difficult to answer because it is frequently the case that one has to
> reason about why a given feature doesn't seem to appear in the
> documentation. It could be that I'm looking in the wrong place, or it could
> be that someone forgot to document something, or it could be that it can in
> fact be done by applying a general mechanism in a specific way that I
> haven't thought of -- as in this case, where one can use a JDBC sink from
> Python if one thinks to use DDL.
>
> So I think it would be helpful to be explicit about both what is, and what
> is not, supported in PyFlink. And to have some very clear organizing
> principles in the documentation so that users can quickly learn where to
> look for specific facts.
>
> Regards,
> David
>
>
> On Tue, Aug 4, 2020 at 1:01 PM jincheng sun <[hidden email]>
> wrote:
>
>> Hi Seth and David,
>>
>> I'm very happy to have your reply and suggestions. I would like to share
>> my thoughts here:
>>
>> The main motivation we want to refactor the PyFlink doc is that we want
>> to make sure that the Python users could find all they want starting from
>> the PyFlink documentation mainpage. That’s, the PyFlink documentation
>> should have a catalogue which includes all the functionalities available in
>> PyFlink. However, this doesn’t mean that we will make a copy of the content
>> of the documentation in the other places. It may be just a reference/link
>> to the other documentation if needed. For the documentation added under
>> PyFlink mainpage, the principle is that it should only include Python
>> specific content, instead of making a copy of the Java content.
>>
>> >>  I'm concerned that this proposal duplicates a lot of content that
>> will quickly get out of sync. It feels like it is documenting PyFlink
>> separately from the rest of the project.
>>
>> Regarding the concerns about maintainability, as mentioned above, The
>> goal of this FLIP is to provide an intelligible entrance of Python API, and
>> the content in it should only contain the information which is useful for
>> Python users. There are indeed many agenda items that duplicate the Java
>> documents in this FLIP, but it doesn't mean the content would be copied
>> from Java documentation. i.e, if the content of the document is the same as
>> the corresponding Java document, we will add a link to the Java document.
>> e.g. the "Built-in functions" and "SQL". We only create a page for the
>> Python-only content, and then redirect to the Java document if there is
>> something shared with Java. e.g. "Connectors" and "Catalogs". If the
>> document is Python-only and already exists, we will move it from the old
>> python document to the new python document, e.g. "Configurations". If the
>> document is Python-only and not exists before, we will create a new page
>> for it. e.g. "DataTypes".
>>
>> The main reason we create a new page for Python Data Types is that it is
>> only conceptually one-to-one correspondence with Java Data Types, but the
>> actual document content would be very different from Java DataTypes. Some
>> detailed difference are as following:
>>
>>
>>
>>   - The text in the Java Data Types document is written for JVM-based
>> language users, which is incomprehensible to users who only understand
>> python.
>>
>>   - Currently the Python Data Types does not support the "bridgedTo"
>> method, DataTypes.RAW, DataTypes.NULL and User Defined Types.
>>
>>   - The section "Planner Compatibility" and "Data Type Extraction" are
>> only useful for Java/Scala users.
>>
>>   - We want to add sections which may only apply for Python such as which
>> Data Types are currently supported in Python, the mapping between DataType
>> and Python object type, etc.
>>
>> I think the root cause of such a difference with existing documents is
>> that, Python is the first non-JVM language we support in flink. This means
>> our previous method of sharing documents between Java and Scala may not be
>> suitable for Python. So we will adopt some very different methods to
>> provide documentation for Python users. Of course, we should reduce
>> maintenance costs as much as possible while ensuring user experience.
>> Furthermore, python is the first step of flink multi-language support, and
>> there may be R, Go, etc in future. it is very necessary for us to form main
>> page for each language, so that users of each type of language can focus on
>> the content which they care about.
>>
>> >> Things like the cookbook and tutorial should be under the Try Flink
>> section of the documentation.
>>
>> Regarding the position of the "Cookbook" section, in my sense the "Try
>> Flink" is for the new users and the "Cookbook" is for more advanced users,
>> i.e., In “Try Flink” can be the simplest end-to-end example, such as “Hello
>> World” and In “Cookbook” we can add more use cases closer to production
>> business, Such as, CDN log analysis, PV / UV of e-commerce. So I prefer to
>> keep the current structure.
>>
>> >>  it's relatively straightforward to compare the Python API with the
>> Java and Scala versions.
>>
>> Regarding the comparison between Python API and Java/Scala API, I think
>> the majority of users, especially the beginner users, would not have this
>> demand. The priority of increasing user experience for beginner users seems
>> higher than it from my side. Would you please add more inputs for why user
>> want to compare? How much impact will the comparison be if we put it on
>> multiple pages :)
>>
>> Thanks for all of your feedback and suggestions, any follow-up feedback
>> is welcome.
>>
>> Best,
>>
>> Jincheng
>>
>>
>> David Anderson <[hidden email]> 于2020年8月3日周一 下午10:49写道:
>>
>>> Jincheng,
>>>
>>> One thing that I like about the way that the documentation is currently
>>> organized is that it's relatively straightforward to compare the Python API
>>> with the Java and Scala versions. I'm concerned that if the PyFlink docs
>>> are more independent, it will be challenging to respond to questions about
>>> which features from the other APIs are available from Python.
>>>
>>> David
>>>
>>> On Mon, Aug 3, 2020 at 8:07 AM jincheng sun <[hidden email]>
>>> wrote:
>>>
>>>> Would be great if you could join the contribution of PyFlink
>>>> documentation @Marta !
>>>> Thanks for all of the positive feedback. I will start a formal vote then
>>>> later...
>>>>
>>>> Best,
>>>> Jincheng
>>>>
>>>>
>>>> Shuiqiang Chen <[hidden email]> 于2020年8月3日周一 上午9:56写道:
>>>>
>>>> > Hi jincheng,
>>>> >
>>>> > Thanks for the discussion. +1 for the FLIP.
>>>> >
>>>> > A well-organized documentation will greatly improve the efficiency and
>>>> > experience for developers.
>>>> >
>>>> > Best,
>>>> > Shuiqiang
>>>> >
>>>> > Hequn Cheng <[hidden email]> 于2020年8月1日周六 上午8:42写道:
>>>> >
>>>> >> Hi Jincheng,
>>>> >>
>>>> >> Thanks a lot for raising the discussion. +1 for the FLIP.
>>>> >>
>>>> >> I think this will bring big benefits for the PyFlink users.
>>>> Currently,
>>>> >> the Python TableAPI document is hidden deeply under the TableAPI&SQL
>>>> tab
>>>> >> which makes it quite unreadable. Also, the PyFlink documentation is
>>>> mixed
>>>> >> with Java/Scala documentation. It is hard for users to have an
>>>> overview of
>>>> >> all the PyFlink documents. As more and more functionalities are
>>>> added into
>>>> >> PyFlink, I think it's time for us to refactor the document.
>>>> >>
>>>> >> Best,
>>>> >> Hequn
>>>> >>
>>>> >>
>>>> >> On Fri, Jul 31, 2020 at 3:43 PM Marta Paes Moreira <
>>>> [hidden email]>
>>>> >> wrote:
>>>> >>
>>>> >>> Hi, Jincheng!
>>>> >>>
>>>> >>> Thanks for creating this detailed FLIP, it will make a big
>>>> difference in
>>>> >>> the experience of Python developers using Flink. I'm interested in
>>>> >>> contributing to this work, so I'll reach out to you offline!
>>>> >>>
>>>> >>> Also, thanks for sharing some information on the adoption of
>>>> PyFlink,
>>>> >>> it's
>>>> >>> great to see that there are already production users.
>>>> >>>
>>>> >>> Marta
>>>> >>>
>>>> >>> On Fri, Jul 31, 2020 at 5:35 AM Xingbo Huang <[hidden email]>
>>>> wrote:
>>>> >>>
>>>> >>> > Hi Jincheng,
>>>> >>> >
>>>> >>> > Thanks a lot for bringing up this discussion and the proposal.
>>>> >>> >
>>>> >>> > Big +1 for improving the structure of PyFlink doc.
>>>> >>> >
>>>> >>> > It will be very friendly to give PyFlink users a unified entrance
>>>> to
>>>> >>> learn
>>>> >>> > PyFlink documents.
>>>> >>> >
>>>> >>> > Best,
>>>> >>> > Xingbo
>>>> >>> >
>>>> >>> > Dian Fu <[hidden email]> 于2020年7月31日周五 上午11:00写道:
>>>> >>> >
>>>> >>> >> Hi Jincheng,
>>>> >>> >>
>>>> >>> >> Thanks a lot for bringing up this discussion and the proposal.
>>>> +1 to
>>>> >>> >> improve the Python API doc.
>>>> >>> >>
>>>> >>> >> I have received many feedbacks from PyFlink beginners about
>>>> >>> >> the PyFlink doc, e.g. the materials are too few, the Python doc
>>>> is
>>>> >>> mixed
>>>> >>> >> with the Java doc and it's not easy to find the docs he wants to
>>>> know.
>>>> >>> >>
>>>> >>> >> I think it would greatly improve the user experience if we can
>>>> have
>>>> >>> one
>>>> >>> >> place which includes most knowledges PyFlink users should know.
>>>> >>> >>
>>>> >>> >> Regards,
>>>> >>> >> Dian
>>>> >>> >>
>>>> >>> >> 在 2020年7月31日,上午10:14,jincheng sun <[hidden email]> 写道:
>>>> >>> >>
>>>> >>> >> Hi folks,
>>>> >>> >>
>>>> >>> >> Since the release of Flink 1.11, users of PyFlink have continued
>>>> to
>>>> >>> grow.
>>>> >>> >> As far as I know there are many companies have used PyFlink for
>>>> data
>>>> >>> >> analysis, operation and maintenance monitoring business has been
>>>> put
>>>> >>> into
>>>> >>> >> production(Such as 聚美优品[1](Jumei),  浙江墨芷[2] (Mozhi) etc.).
>>>> According
>>>> >>> to
>>>> >>> >> the feedback we received, current documentation is not very
>>>> friendly
>>>> >>> to
>>>> >>> >> PyFlink users. There are two shortcomings:
>>>> >>> >>
>>>> >>> >> - Python related content is mixed in the Java/Scala
>>>> documentation,
>>>> >>> which
>>>> >>> >> makes it difficult for users who only focus on PyFlink to read.
>>>> >>> >> - There is already a "Python Table API" section in the Table API
>>>> >>> document
>>>> >>> >> to store PyFlink documents, but the number of articles is small
>>>> and
>>>> >>> the
>>>> >>> >> content is fragmented. It is difficult for beginners to learn
>>>> from it.
>>>> >>> >>
>>>> >>> >> In addition, FLIP-130 introduced the Python DataStream API. Many
>>>> >>> >> documents will be added for those new APIs. In order to increase
>>>> the
>>>> >>> >> readability and maintainability of the PyFlink document, Wei
>>>> Zhong
>>>> >>> and me
>>>> >>> >> have discussed offline and would like to rework it via this FLIP.
>>>> >>> >>
>>>> >>> >> We will rework the document around the following three
>>>> objectives:
>>>> >>> >>
>>>> >>> >> - Add a separate section for Python API under the "Application
>>>> >>> >> Development" section.
>>>> >>> >> - Restructure current Python documentation to a brand new
>>>> structure to
>>>> >>> >> ensure complete content and friendly to beginners.
>>>> >>> >> - Improve the documents shared by Python/Java/Scala to make it
>>>> more
>>>> >>> >> friendly to Python users and without affecting Java/Scala users.
>>>> >>> >>
>>>> >>> >> More detail can be found in the FLIP-133:
>>>> >>> >>
>>>> >>>
>>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-133%3A+Rework+PyFlink+Documentation
>>>> >>> >>
>>>> >>> >> Best,
>>>> >>> >> Jincheng
>>>> >>> >>
>>>> >>> >> [1] https://mp.weixin.qq.com/s/zVsBIs1ZEFe4atYUYtZpRg
>>>> >>> >> [2] https://mp.weixin.qq.com/s/R4p_a2TWGpESBWr3pLtM2g
>>>> >>> >>
>>>> >>> >>
>>>> >>> >>
>>>> >>>
>>>> >>
>>>>
>>>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] FLIP-133: Rework PyFlink Documentation

Wei Zhong
Hi Xingbo,

Thanks for your information.

I think the PySpark's documentation redesigning deserves our attention. It seems that the Spark community has also begun to treat the user experience of Python documentation more seriously. We can continue to pay attention to the discussion and progress of the redesigning in the Spark community. It is so similar to our working that there should be some ideas worthy for us.

Best,
Wei


> 在 2020年8月5日,15:02,Xingbo Huang <[hidden email]> 写道:
>
> Hi,
>
> I found that the spark community is also working on redesigning pyspark documentation[1] recently. Maybe we can compare the difference between our document structure and its document structure.
>
> [1] https://issues.apache.org/jira/browse/SPARK-31851 <https://issues.apache.org/jira/browse/SPARK-31851>
> http://apache-spark-developers-list.1001551.n3.nabble.com/Need-some-help-and-contributions-in-PySpark-API-documentation-td29972.html <http://apache-spark-developers-list.1001551.n3.nabble.com/Need-some-help-and-contributions-in-PySpark-API-documentation-td29972.html>
>
> Best,
> Xingbo
>
> David Anderson <[hidden email] <mailto:[hidden email]>> 于2020年8月5日周三 上午3:17写道:
> I'm delighted to see energy going into improving the documentation.
>
> With the current documentation, I get a lot of questions that I believe reflect two fundamental problems with what we currently provide:
>
> (1) We have a lot of contextual information in our heads about how Flink works, and we are able to use that knowledge to make reasonable inferences about how things (probably) work in cases we aren't so familiar with. For example, I get a lot of questions of the form "If I use <this feature> will I still have exactly once guarantees?" The answer is always yes, but they continue to have doubts because we have failed to clearly communicate this fundamental, underlying principle.
>
> This specific example about fault tolerance applies across all of the Flink docs, but the general idea can also be applied to the Table/SQL and PyFlink docs. The guiding principles underlying these APIs should be written down in one easy-to-find place.
>
> (2) The other kind of question I get a lot is "Can I do <X> with <Y>?" E.g., "Can I use the JDBC table sink from PyFlink?" These questions can be very difficult to answer because it is frequently the case that one has to reason about why a given feature doesn't seem to appear in the documentation. It could be that I'm looking in the wrong place, or it could be that someone forgot to document something, or it could be that it can in fact be done by applying a general mechanism in a specific way that I haven't thought of -- as in this case, where one can use a JDBC sink from Python if one thinks to use DDL.
>
> So I think it would be helpful to be explicit about both what is, and what is not, supported in PyFlink. And to have some very clear organizing principles in the documentation so that users can quickly learn where to look for specific facts.
>
> Regards,
> David
>
>
> On Tue, Aug 4, 2020 at 1:01 PM jincheng sun <[hidden email] <mailto:[hidden email]>> wrote:
> Hi Seth and David,
>
> I'm very happy to have your reply and suggestions. I would like to share my thoughts here:
>
> The main motivation we want to refactor the PyFlink doc is that we want to make sure that the Python users could find all they want starting from the PyFlink documentation mainpage. That’s, the PyFlink documentation should have a catalogue which includes all the functionalities available in PyFlink. However, this doesn’t mean that we will make a copy of the content of the documentation in the other places. It may be just a reference/link to the other documentation if needed. For the documentation added under PyFlink mainpage, the principle is that it should only include Python specific content, instead of making a copy of the Java content.
>
> >>  I'm concerned that this proposal duplicates a lot of content that will quickly get out of sync. It feels like it is documenting PyFlink separately from the rest of the project.
>
> Regarding the concerns about maintainability, as mentioned above, The goal of this FLIP is to provide an intelligible entrance of Python API, and the content in it should only contain the information which is useful for Python users. There are indeed many agenda items that duplicate the Java documents in this FLIP, but it doesn't mean the content would be copied from Java documentation. i.e, if the content of the document is the same as the corresponding Java document, we will add a link to the Java document. e.g. the "Built-in functions" and "SQL". We only create a page for the Python-only content, and then redirect to the Java document if there is something shared with Java. e.g. "Connectors" and "Catalogs". If the document is Python-only and already exists, we will move it from the old python document to the new python document, e.g. "Configurations". If the document is Python-only and not exists before, we will create a new page for it. e.g. "DataTypes".
>
> The main reason we create a new page for Python Data Types is that it is only conceptually one-to-one correspondence with Java Data Types, but the actual document content would be very different from Java DataTypes. Some detailed difference are as following:
>  
>   - The text in the Java Data Types document is written for JVM-based language users, which is incomprehensible to users who only understand python.
>   - Currently the Python Data Types does not support the "bridgedTo" method, DataTypes.RAW, DataTypes.NULL and User Defined Types.
>   - The section "Planner Compatibility" and "Data Type Extraction" are only useful for Java/Scala users.
>   - We want to add sections which may only apply for Python such as which Data Types are currently supported in Python, the mapping between DataType and Python object type, etc.
>
> I think the root cause of such a difference with existing documents is that, Python is the first non-JVM language we support in flink. This means our previous method of sharing documents between Java and Scala may not be suitable for Python. So we will adopt some very different methods to provide documentation for Python users. Of course, we should reduce maintenance costs as much as possible while ensuring user experience. Furthermore, python is the first step of flink multi-language support, and there may be R, Go, etc in future. it is very necessary for us to form main page for each language, so that users of each type of language can focus on the content which they care about.
>
> >> Things like the cookbook and tutorial should be under the Try Flink section of the documentation.
>
> Regarding the position of the "Cookbook" section, in my sense the "Try Flink" is for the new users and the "Cookbook" is for more advanced users, i.e., In “Try Flink” can be the simplest end-to-end example, such as “Hello World” and In “Cookbook” we can add more use cases closer to production business, Such as, CDN log analysis, PV / UV of e-commerce. So I prefer to keep the current structure.
>
> >>  it's relatively straightforward to compare the Python API with the Java and Scala versions.
>
> Regarding the comparison between Python API and Java/Scala API, I think the majority of users, especially the beginner users, would not have this demand. The priority of increasing user experience for beginner users seems higher than it from my side. Would you please add more inputs for why user want to compare? How much impact will the comparison be if we put it on multiple pages :)
>
> Thanks for all of your feedback and suggestions, any follow-up feedback is welcome.
>
> Best,
> Jincheng
>
>
> David Anderson <[hidden email] <mailto:[hidden email]>> 于2020年8月3日周一 下午10:49写道:
> Jincheng,
>
> One thing that I like about the way that the documentation is currently organized is that it's relatively straightforward to compare the Python API with the Java and Scala versions. I'm concerned that if the PyFlink docs are more independent, it will be challenging to respond to questions about which features from the other APIs are available from Python.
>
> David
>
> On Mon, Aug 3, 2020 at 8:07 AM jincheng sun <[hidden email] <mailto:[hidden email]>> wrote:
> Would be great if you could join the contribution of PyFlink
> documentation @Marta !
> Thanks for all of the positive feedback. I will start a formal vote then
> later...
>
> Best,
> Jincheng
>
>
> Shuiqiang Chen <[hidden email] <mailto:[hidden email]>> 于2020年8月3日周一 上午9:56写道:
>
> > Hi jincheng,
> >
> > Thanks for the discussion. +1 for the FLIP.
> >
> > A well-organized documentation will greatly improve the efficiency and
> > experience for developers.
> >
> > Best,
> > Shuiqiang
> >
> > Hequn Cheng <[hidden email] <mailto:[hidden email]>> 于2020年8月1日周六 上午8:42写道:
> >
> >> Hi Jincheng,
> >>
> >> Thanks a lot for raising the discussion. +1 for the FLIP.
> >>
> >> I think this will bring big benefits for the PyFlink users. Currently,
> >> the Python TableAPI document is hidden deeply under the TableAPI&SQL tab
> >> which makes it quite unreadable. Also, the PyFlink documentation is mixed
> >> with Java/Scala documentation. It is hard for users to have an overview of
> >> all the PyFlink documents. As more and more functionalities are added into
> >> PyFlink, I think it's time for us to refactor the document.
> >>
> >> Best,
> >> Hequn
> >>
> >>
> >> On Fri, Jul 31, 2020 at 3:43 PM Marta Paes Moreira <[hidden email] <mailto:[hidden email]>>
> >> wrote:
> >>
> >>> Hi, Jincheng!
> >>>
> >>> Thanks for creating this detailed FLIP, it will make a big difference in
> >>> the experience of Python developers using Flink. I'm interested in
> >>> contributing to this work, so I'll reach out to you offline!
> >>>
> >>> Also, thanks for sharing some information on the adoption of PyFlink,
> >>> it's
> >>> great to see that there are already production users.
> >>>
> >>> Marta
> >>>
> >>> On Fri, Jul 31, 2020 at 5:35 AM Xingbo Huang <[hidden email] <mailto:[hidden email]>> wrote:
> >>>
> >>> > Hi Jincheng,
> >>> >
> >>> > Thanks a lot for bringing up this discussion and the proposal.
> >>> >
> >>> > Big +1 for improving the structure of PyFlink doc.
> >>> >
> >>> > It will be very friendly to give PyFlink users a unified entrance to
> >>> learn
> >>> > PyFlink documents.
> >>> >
> >>> > Best,
> >>> > Xingbo
> >>> >
> >>> > Dian Fu <[hidden email] <mailto:[hidden email]>> 于2020年7月31日周五 上午11:00写道:
> >>> >
> >>> >> Hi Jincheng,
> >>> >>
> >>> >> Thanks a lot for bringing up this discussion and the proposal. +1 to
> >>> >> improve the Python API doc.
> >>> >>
> >>> >> I have received many feedbacks from PyFlink beginners about
> >>> >> the PyFlink doc, e.g. the materials are too few, the Python doc is
> >>> mixed
> >>> >> with the Java doc and it's not easy to find the docs he wants to know.
> >>> >>
> >>> >> I think it would greatly improve the user experience if we can have
> >>> one
> >>> >> place which includes most knowledges PyFlink users should know.
> >>> >>
> >>> >> Regards,
> >>> >> Dian
> >>> >>
> >>> >> 在 2020年7月31日,上午10:14,jincheng sun <[hidden email] <mailto:[hidden email]>> 写道:
> >>> >>
> >>> >> Hi folks,
> >>> >>
> >>> >> Since the release of Flink 1.11, users of PyFlink have continued to
> >>> grow.
> >>> >> As far as I know there are many companies have used PyFlink for data
> >>> >> analysis, operation and maintenance monitoring business has been put
> >>> into
> >>> >> production(Such as 聚美优品[1](Jumei),  浙江墨芷[2] (Mozhi) etc.).  According
> >>> to
> >>> >> the feedback we received, current documentation is not very friendly
> >>> to
> >>> >> PyFlink users. There are two shortcomings:
> >>> >>
> >>> >> - Python related content is mixed in the Java/Scala documentation,
> >>> which
> >>> >> makes it difficult for users who only focus on PyFlink to read.
> >>> >> - There is already a "Python Table API" section in the Table API
> >>> document
> >>> >> to store PyFlink documents, but the number of articles is small and
> >>> the
> >>> >> content is fragmented. It is difficult for beginners to learn from it.
> >>> >>
> >>> >> In addition, FLIP-130 introduced the Python DataStream API. Many
> >>> >> documents will be added for those new APIs. In order to increase the
> >>> >> readability and maintainability of the PyFlink document, Wei Zhong
> >>> and me
> >>> >> have discussed offline and would like to rework it via this FLIP.
> >>> >>
> >>> >> We will rework the document around the following three objectives:
> >>> >>
> >>> >> - Add a separate section for Python API under the "Application
> >>> >> Development" section.
> >>> >> - Restructure current Python documentation to a brand new structure to
> >>> >> ensure complete content and friendly to beginners.
> >>> >> - Improve the documents shared by Python/Java/Scala to make it more
> >>> >> friendly to Python users and without affecting Java/Scala users.
> >>> >>
> >>> >> More detail can be found in the FLIP-133:
> >>> >>
> >>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-133%3A+Rework+PyFlink+Documentation <https://cwiki.apache.org/confluence/display/FLINK/FLIP-133%3A+Rework+PyFlink+Documentation>
> >>> >>
> >>> >> Best,
> >>> >> Jincheng
> >>> >>
> >>> >> [1] https://mp.weixin.qq.com/s/zVsBIs1ZEFe4atYUYtZpRg <https://mp.weixin.qq.com/s/zVsBIs1ZEFe4atYUYtZpRg>
> >>> >> [2] https://mp.weixin.qq.com/s/R4p_a2TWGpESBWr3pLtM2g <https://mp.weixin.qq.com/s/R4p_a2TWGpESBWr3pLtM2g>
> >>> >>
> >>> >>
> >>> >>
> >>>
> >>

Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] FLIP-133: Rework PyFlink Documentation

jincheng sun
Hi David, Thank you for sharing the problems with the current document, and
I agree with you as I also got the same feedback from Chinese users. I am
often contacted by users to ask questions such as whether PyFlink supports
"Java UDF" and whether PyFlink supports "xxxConnector". The root cause of
these problems is that our existing documents are based on Java users (text
and API mixed part). Since Python is newly added from 1.9, many document
information is not friendly to Python users. They don't want to look for
Python content in unfamiliar Java documents. Just yesterday, there were
complaints from Chinese users about where is all the document entries of
 Python API. So, have a centralized entry and clear document structure,
which is the urgent demand of Python users. The original intention of FLIP
is do our best to solve these user pain points.

Hi Xingbo and Wei Thank you for sharing PySpark's status on document
optimization. You're right. PySpark already has a lot of Python user
groups. They also find that Python user community is an important position
for multilingual support. The centralization and unification of Python
document content will reduce the learning cost of Python users, and good
document structure and content will also reduce the Q & A burden of the
community, It's a once and for all job.

Hi Seth, I wonder if your concerns have been resolved through the previous
discussion?

Anyway, the principle of FLIP is that in python document should only
include Python specific content, instead of making a copy of the Java
content. And would be great to have you to join in the improvement for
PyFlink (Both PRs and Review PRs).

Best,
Jincheng


Wei Zhong <[hidden email]> 于2020年8月5日周三 下午5:46写道:

> Hi Xingbo,
>
> Thanks for your information.
>
> I think the PySpark's documentation redesigning deserves our attention. It
> seems that the Spark community has also begun to treat the user experience
> of Python documentation more seriously. We can continue to pay attention to
> the discussion and progress of the redesigning in the Spark community. It
> is so similar to our working that there should be some ideas worthy for us.
>
> Best,
> Wei
>
>
> 在 2020年8月5日,15:02,Xingbo Huang <[hidden email]> 写道:
>
> Hi,
>
> I found that the spark community is also working on redesigning pyspark
> documentation[1] recently. Maybe we can compare the difference between our
> document structure and its document structure.
>
> [1] https://issues.apache.org/jira/browse/SPARK-31851
>
> http://apache-spark-developers-list.1001551.n3.nabble.com/Need-some-help-and-contributions-in-PySpark-API-documentation-td29972.html
>
> Best,
> Xingbo
>
> David Anderson <[hidden email]> 于2020年8月5日周三 上午3:17写道:
>
>> I'm delighted to see energy going into improving the documentation.
>>
>> With the current documentation, I get a lot of questions that I believe
>> reflect two fundamental problems with what we currently provide:
>>
>> (1) We have a lot of contextual information in our heads about how Flink
>> works, and we are able to use that knowledge to make reasonable inferences
>> about how things (probably) work in cases we aren't so familiar with. For
>> example, I get a lot of questions of the form "If I use <this feature> will
>> I still have exactly once guarantees?" The answer is always yes, but they
>> continue to have doubts because we have failed to clearly communicate this
>> fundamental, underlying principle.
>>
>> This specific example about fault tolerance applies across all of the
>> Flink docs, but the general idea can also be applied to the Table/SQL and
>> PyFlink docs. The guiding principles underlying these APIs should be
>> written down in one easy-to-find place.
>>
>> (2) The other kind of question I get a lot is "Can I do <X> with <Y>?"
>> E.g., "Can I use the JDBC table sink from PyFlink?" These questions can be
>> very difficult to answer because it is frequently the case that one has to
>> reason about why a given feature doesn't seem to appear in the
>> documentation. It could be that I'm looking in the wrong place, or it could
>> be that someone forgot to document something, or it could be that it can in
>> fact be done by applying a general mechanism in a specific way that I
>> haven't thought of -- as in this case, where one can use a JDBC sink from
>> Python if one thinks to use DDL.
>>
>> So I think it would be helpful to be explicit about both what is, and
>> what is not, supported in PyFlink. And to have some very clear organizing
>> principles in the documentation so that users can quickly learn where to
>> look for specific facts.
>>
>> Regards,
>> David
>>
>>
>> On Tue, Aug 4, 2020 at 1:01 PM jincheng sun <[hidden email]>
>> wrote:
>>
>>> Hi Seth and David,
>>>
>>> I'm very happy to have your reply and suggestions. I would like to share
>>> my thoughts here:
>>>
>>> The main motivation we want to refactor the PyFlink doc is that we want
>>> to make sure that the Python users could find all they want starting from
>>> the PyFlink documentation mainpage. That’s, the PyFlink documentation
>>> should have a catalogue which includes all the functionalities available in
>>> PyFlink. However, this doesn’t mean that we will make a copy of the content
>>> of the documentation in the other places. It may be just a reference/link
>>> to the other documentation if needed. For the documentation added under
>>> PyFlink mainpage, the principle is that it should only include Python
>>> specific content, instead of making a copy of the Java content.
>>>
>>> >>  I'm concerned that this proposal duplicates a lot of content that
>>> will quickly get out of sync. It feels like it is documenting PyFlink
>>> separately from the rest of the project.
>>>
>>> Regarding the concerns about maintainability, as mentioned above, The
>>> goal of this FLIP is to provide an intelligible entrance of Python API, and
>>> the content in it should only contain the information which is useful for
>>> Python users. There are indeed many agenda items that duplicate the Java
>>> documents in this FLIP, but it doesn't mean the content would be copied
>>> from Java documentation. i.e, if the content of the document is the same as
>>> the corresponding Java document, we will add a link to the Java document.
>>> e.g. the "Built-in functions" and "SQL". We only create a page for the
>>> Python-only content, and then redirect to the Java document if there is
>>> something shared with Java. e.g. "Connectors" and "Catalogs". If the
>>> document is Python-only and already exists, we will move it from the old
>>> python document to the new python document, e.g. "Configurations". If the
>>> document is Python-only and not exists before, we will create a new page
>>> for it. e.g. "DataTypes".
>>>
>>> The main reason we create a new page for Python Data Types is that it is
>>> only conceptually one-to-one correspondence with Java Data Types, but the
>>> actual document content would be very different from Java DataTypes. Some
>>> detailed difference are as following:
>>>
>>>
>>>   - The text in the Java Data Types document is written for JVM-based
>>> language users, which is incomprehensible to users who only understand
>>> python.
>>>   - Currently the Python Data Types does not support the "bridgedTo"
>>> method, DataTypes.RAW, DataTypes.NULL and User Defined Types.
>>>   - The section "Planner Compatibility" and "Data Type Extraction" are
>>> only useful for Java/Scala users.
>>>   - We want to add sections which may only apply for Python such as
>>> which Data Types are currently supported in Python, the mapping between
>>> DataType and Python object type, etc.
>>>
>>> I think the root cause of such a difference with existing documents is
>>> that, Python is the first non-JVM language we support in flink. This means
>>> our previous method of sharing documents between Java and Scala may not be
>>> suitable for Python. So we will adopt some very different methods to
>>> provide documentation for Python users. Of course, we should reduce
>>> maintenance costs as much as possible while ensuring user experience.
>>> Furthermore, python is the first step of flink multi-language support, and
>>> there may be R, Go, etc in future. it is very necessary for us to form main
>>> page for each language, so that users of each type of language can focus on
>>> the content which they care about.
>>>
>>> >> Things like the cookbook and tutorial should be under the Try Flink
>>> section of the documentation.
>>>
>>> Regarding the position of the "Cookbook" section, in my sense the "Try
>>> Flink" is for the new users and the "Cookbook" is for more advanced users,
>>> i.e., In “Try Flink” can be the simplest end-to-end example, such as “Hello
>>> World” and In “Cookbook” we can add more use cases closer to production
>>> business, Such as, CDN log analysis, PV / UV of e-commerce. So I prefer to
>>> keep the current structure.
>>>
>>> >>  it's relatively straightforward to compare the Python API with the
>>> Java and Scala versions.
>>>
>>> Regarding the comparison between Python API and Java/Scala API, I think
>>> the majority of users, especially the beginner users, would not have this
>>> demand. The priority of increasing user experience for beginner users seems
>>> higher than it from my side. Would you please add more inputs for why user
>>> want to compare? How much impact will the comparison be if we put it on
>>> multiple pages :)
>>>
>>> Thanks for all of your feedback and suggestions, any follow-up feedback
>>> is welcome.
>>>
>>> Best,
>>> Jincheng
>>>
>>>
>>> David Anderson <[hidden email]> 于2020年8月3日周一 下午10:49写道:
>>>
>>>> Jincheng,
>>>>
>>>> One thing that I like about the way that the documentation is currently
>>>> organized is that it's relatively straightforward to compare the Python API
>>>> with the Java and Scala versions. I'm concerned that if the PyFlink docs
>>>> are more independent, it will be challenging to respond to questions about
>>>> which features from the other APIs are available from Python.
>>>>
>>>> David
>>>>
>>>> On Mon, Aug 3, 2020 at 8:07 AM jincheng sun <[hidden email]>
>>>> wrote:
>>>>
>>>>> Would be great if you could join the contribution of PyFlink
>>>>> documentation @Marta !
>>>>> Thanks for all of the positive feedback. I will start a formal vote
>>>>> then
>>>>> later...
>>>>>
>>>>> Best,
>>>>> Jincheng
>>>>>
>>>>>
>>>>> Shuiqiang Chen <[hidden email]> 于2020年8月3日周一 上午9:56写道:
>>>>>
>>>>> > Hi jincheng,
>>>>> >
>>>>> > Thanks for the discussion. +1 for the FLIP.
>>>>> >
>>>>> > A well-organized documentation will greatly improve the efficiency
>>>>> and
>>>>> > experience for developers.
>>>>> >
>>>>> > Best,
>>>>> > Shuiqiang
>>>>> >
>>>>> > Hequn Cheng <[hidden email]> 于2020年8月1日周六 上午8:42写道:
>>>>> >
>>>>> >> Hi Jincheng,
>>>>> >>
>>>>> >> Thanks a lot for raising the discussion. +1 for the FLIP.
>>>>> >>
>>>>> >> I think this will bring big benefits for the PyFlink users.
>>>>> Currently,
>>>>> >> the Python TableAPI document is hidden deeply under the
>>>>> TableAPI&SQL tab
>>>>> >> which makes it quite unreadable. Also, the PyFlink documentation is
>>>>> mixed
>>>>> >> with Java/Scala documentation. It is hard for users to have an
>>>>> overview of
>>>>> >> all the PyFlink documents. As more and more functionalities are
>>>>> added into
>>>>> >> PyFlink, I think it's time for us to refactor the document.
>>>>> >>
>>>>> >> Best,
>>>>> >> Hequn
>>>>> >>
>>>>> >>
>>>>> >> On Fri, Jul 31, 2020 at 3:43 PM Marta Paes Moreira <
>>>>> [hidden email]>
>>>>> >> wrote:
>>>>> >>
>>>>> >>> Hi, Jincheng!
>>>>> >>>
>>>>> >>> Thanks for creating this detailed FLIP, it will make a big
>>>>> difference in
>>>>> >>> the experience of Python developers using Flink. I'm interested in
>>>>> >>> contributing to this work, so I'll reach out to you offline!
>>>>> >>>
>>>>> >>> Also, thanks for sharing some information on the adoption of
>>>>> PyFlink,
>>>>> >>> it's
>>>>> >>> great to see that there are already production users.
>>>>> >>>
>>>>> >>> Marta
>>>>> >>>
>>>>> >>> On Fri, Jul 31, 2020 at 5:35 AM Xingbo Huang <[hidden email]>
>>>>> wrote:
>>>>> >>>
>>>>> >>> > Hi Jincheng,
>>>>> >>> >
>>>>> >>> > Thanks a lot for bringing up this discussion and the proposal.
>>>>> >>> >
>>>>> >>> > Big +1 for improving the structure of PyFlink doc.
>>>>> >>> >
>>>>> >>> > It will be very friendly to give PyFlink users a unified
>>>>> entrance to
>>>>> >>> learn
>>>>> >>> > PyFlink documents.
>>>>> >>> >
>>>>> >>> > Best,
>>>>> >>> > Xingbo
>>>>> >>> >
>>>>> >>> > Dian Fu <[hidden email]> 于2020年7月31日周五 上午11:00写道:
>>>>> >>> >
>>>>> >>> >> Hi Jincheng,
>>>>> >>> >>
>>>>> >>> >> Thanks a lot for bringing up this discussion and the proposal.
>>>>> +1 to
>>>>> >>> >> improve the Python API doc.
>>>>> >>> >>
>>>>> >>> >> I have received many feedbacks from PyFlink beginners about
>>>>> >>> >> the PyFlink doc, e.g. the materials are too few, the Python doc
>>>>> is
>>>>> >>> mixed
>>>>> >>> >> with the Java doc and it's not easy to find the docs he wants
>>>>> to know.
>>>>> >>> >>
>>>>> >>> >> I think it would greatly improve the user experience if we can
>>>>> have
>>>>> >>> one
>>>>> >>> >> place which includes most knowledges PyFlink users should know.
>>>>> >>> >>
>>>>> >>> >> Regards,
>>>>> >>> >> Dian
>>>>> >>> >>
>>>>> >>> >> 在 2020年7月31日,上午10:14,jincheng sun <[hidden email]>
>>>>> 写道:
>>>>> >>> >>
>>>>> >>> >> Hi folks,
>>>>> >>> >>
>>>>> >>> >> Since the release of Flink 1.11, users of PyFlink have
>>>>> continued to
>>>>> >>> grow.
>>>>> >>> >> As far as I know there are many companies have used PyFlink for
>>>>> data
>>>>> >>> >> analysis, operation and maintenance monitoring business has
>>>>> been put
>>>>> >>> into
>>>>> >>> >> production(Such as 聚美优品[1](Jumei),  浙江墨芷[2] (Mozhi) etc.).
>>>>> According
>>>>> >>> to
>>>>> >>> >> the feedback we received, current documentation is not very
>>>>> friendly
>>>>> >>> to
>>>>> >>> >> PyFlink users. There are two shortcomings:
>>>>> >>> >>
>>>>> >>> >> - Python related content is mixed in the Java/Scala
>>>>> documentation,
>>>>> >>> which
>>>>> >>> >> makes it difficult for users who only focus on PyFlink to read.
>>>>> >>> >> - There is already a "Python Table API" section in the Table API
>>>>> >>> document
>>>>> >>> >> to store PyFlink documents, but the number of articles is small
>>>>> and
>>>>> >>> the
>>>>> >>> >> content is fragmented. It is difficult for beginners to learn
>>>>> from it.
>>>>> >>> >>
>>>>> >>> >> In addition, FLIP-130 introduced the Python DataStream API. Many
>>>>> >>> >> documents will be added for those new APIs. In order to
>>>>> increase the
>>>>> >>> >> readability and maintainability of the PyFlink document, Wei
>>>>> Zhong
>>>>> >>> and me
>>>>> >>> >> have discussed offline and would like to rework it via this
>>>>> FLIP.
>>>>> >>> >>
>>>>> >>> >> We will rework the document around the following three
>>>>> objectives:
>>>>> >>> >>
>>>>> >>> >> - Add a separate section for Python API under the "Application
>>>>> >>> >> Development" section.
>>>>> >>> >> - Restructure current Python documentation to a brand new
>>>>> structure to
>>>>> >>> >> ensure complete content and friendly to beginners.
>>>>> >>> >> - Improve the documents shared by Python/Java/Scala to make it
>>>>> more
>>>>> >>> >> friendly to Python users and without affecting Java/Scala users.
>>>>> >>> >>
>>>>> >>> >> More detail can be found in the FLIP-133:
>>>>> >>> >>
>>>>> >>>
>>>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-133%3A+Rework+PyFlink+Documentation
>>>>> >>> >>
>>>>> >>> >> Best,
>>>>> >>> >> Jincheng
>>>>> >>> >>
>>>>> >>> >> [1] https://mp.weixin.qq.com/s/zVsBIs1ZEFe4atYUYtZpRg
>>>>> >>> >> [2] https://mp.weixin.qq.com/s/R4p_a2TWGpESBWr3pLtM2g
>>>>> >>> >>
>>>>> >>> >>
>>>>> >>> >>
>>>>> >>>
>>>>> >>
>>>>>
>>>>
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] FLIP-133: Rework PyFlink Documentation

Seth Wiesman
I think this sounds good. +1

On Wed, Aug 5, 2020 at 8:37 PM jincheng sun <[hidden email]>
wrote:

> Hi David, Thank you for sharing the problems with the current document,
> and I agree with you as I also got the same feedback from Chinese users. I
> am often contacted by users to ask questions such as whether PyFlink
> supports "Java UDF" and whether PyFlink supports "xxxConnector". The root
> cause of these problems is that our existing documents are based on Java
> users (text and API mixed part). Since Python is newly added from 1.9, many
> document information is not friendly to Python users. They don't want to
> look for Python content in unfamiliar Java documents. Just yesterday, there
> were complaints from Chinese users about where is all the document entries
> of  Python API. So, have a centralized entry and clear document structure,
> which is the urgent demand of Python users. The original intention of FLIP
> is do our best to solve these user pain points.
>
> Hi Xingbo and Wei Thank you for sharing PySpark's status on document
> optimization. You're right. PySpark already has a lot of Python user
> groups. They also find that Python user community is an important position
> for multilingual support. The centralization and unification of Python
> document content will reduce the learning cost of Python users, and good
> document structure and content will also reduce the Q & A burden of the
> community, It's a once and for all job.
>
> Hi Seth, I wonder if your concerns have been resolved through the previous
> discussion?
>
> Anyway, the principle of FLIP is that in python document should only
> include Python specific content, instead of making a copy of the Java
> content. And would be great to have you to join in the improvement for
> PyFlink (Both PRs and Review PRs).
>
> Best,
> Jincheng
>
>
> Wei Zhong <[hidden email]> 于2020年8月5日周三 下午5:46写道:
>
>> Hi Xingbo,
>>
>> Thanks for your information.
>>
>> I think the PySpark's documentation redesigning deserves our attention.
>> It seems that the Spark community has also begun to treat the user
>> experience of Python documentation more seriously. We can continue to pay
>> attention to the discussion and progress of the redesigning in the Spark
>> community. It is so similar to our working that there should be some ideas
>> worthy for us.
>>
>> Best,
>> Wei
>>
>>
>> 在 2020年8月5日,15:02,Xingbo Huang <[hidden email]> 写道:
>>
>> Hi,
>>
>> I found that the spark community is also working on redesigning pyspark
>> documentation[1] recently. Maybe we can compare the difference between our
>> document structure and its document structure.
>>
>> [1] https://issues.apache.org/jira/browse/SPARK-31851
>>
>> http://apache-spark-developers-list.1001551.n3.nabble.com/Need-some-help-and-contributions-in-PySpark-API-documentation-td29972.html
>>
>> Best,
>> Xingbo
>>
>> David Anderson <[hidden email]> 于2020年8月5日周三 上午3:17写道:
>>
>>> I'm delighted to see energy going into improving the documentation.
>>>
>>> With the current documentation, I get a lot of questions that I believe
>>> reflect two fundamental problems with what we currently provide:
>>>
>>> (1) We have a lot of contextual information in our heads about how Flink
>>> works, and we are able to use that knowledge to make reasonable inferences
>>> about how things (probably) work in cases we aren't so familiar with. For
>>> example, I get a lot of questions of the form "If I use <this feature> will
>>> I still have exactly once guarantees?" The answer is always yes, but they
>>> continue to have doubts because we have failed to clearly communicate this
>>> fundamental, underlying principle.
>>>
>>> This specific example about fault tolerance applies across all of the
>>> Flink docs, but the general idea can also be applied to the Table/SQL and
>>> PyFlink docs. The guiding principles underlying these APIs should be
>>> written down in one easy-to-find place.
>>>
>>> (2) The other kind of question I get a lot is "Can I do <X> with <Y>?"
>>> E.g., "Can I use the JDBC table sink from PyFlink?" These questions can be
>>> very difficult to answer because it is frequently the case that one has to
>>> reason about why a given feature doesn't seem to appear in the
>>> documentation. It could be that I'm looking in the wrong place, or it could
>>> be that someone forgot to document something, or it could be that it can in
>>> fact be done by applying a general mechanism in a specific way that I
>>> haven't thought of -- as in this case, where one can use a JDBC sink from
>>> Python if one thinks to use DDL.
>>>
>>> So I think it would be helpful to be explicit about both what is, and
>>> what is not, supported in PyFlink. And to have some very clear organizing
>>> principles in the documentation so that users can quickly learn where to
>>> look for specific facts.
>>>
>>> Regards,
>>> David
>>>
>>>
>>> On Tue, Aug 4, 2020 at 1:01 PM jincheng sun <[hidden email]>
>>> wrote:
>>>
>>>> Hi Seth and David,
>>>>
>>>> I'm very happy to have your reply and suggestions. I would like to
>>>> share my thoughts here:
>>>>
>>>> The main motivation we want to refactor the PyFlink doc is that we want
>>>> to make sure that the Python users could find all they want starting from
>>>> the PyFlink documentation mainpage. That’s, the PyFlink documentation
>>>> should have a catalogue which includes all the functionalities available in
>>>> PyFlink. However, this doesn’t mean that we will make a copy of the content
>>>> of the documentation in the other places. It may be just a reference/link
>>>> to the other documentation if needed. For the documentation added under
>>>> PyFlink mainpage, the principle is that it should only include Python
>>>> specific content, instead of making a copy of the Java content.
>>>>
>>>> >>  I'm concerned that this proposal duplicates a lot of content that
>>>> will quickly get out of sync. It feels like it is documenting PyFlink
>>>> separately from the rest of the project.
>>>>
>>>> Regarding the concerns about maintainability, as mentioned above, The
>>>> goal of this FLIP is to provide an intelligible entrance of Python API, and
>>>> the content in it should only contain the information which is useful for
>>>> Python users. There are indeed many agenda items that duplicate the Java
>>>> documents in this FLIP, but it doesn't mean the content would be copied
>>>> from Java documentation. i.e, if the content of the document is the same as
>>>> the corresponding Java document, we will add a link to the Java document.
>>>> e.g. the "Built-in functions" and "SQL". We only create a page for the
>>>> Python-only content, and then redirect to the Java document if there is
>>>> something shared with Java. e.g. "Connectors" and "Catalogs". If the
>>>> document is Python-only and already exists, we will move it from the old
>>>> python document to the new python document, e.g. "Configurations". If the
>>>> document is Python-only and not exists before, we will create a new page
>>>> for it. e.g. "DataTypes".
>>>>
>>>> The main reason we create a new page for Python Data Types is that it
>>>> is only conceptually one-to-one correspondence with Java Data Types, but
>>>> the actual document content would be very different from Java DataTypes.
>>>> Some detailed difference are as following:
>>>>
>>>>
>>>>   - The text in the Java Data Types document is written for JVM-based
>>>> language users, which is incomprehensible to users who only understand
>>>> python.
>>>>   - Currently the Python Data Types does not support the "bridgedTo"
>>>> method, DataTypes.RAW, DataTypes.NULL and User Defined Types.
>>>>   - The section "Planner Compatibility" and "Data Type Extraction" are
>>>> only useful for Java/Scala users.
>>>>   - We want to add sections which may only apply for Python such as
>>>> which Data Types are currently supported in Python, the mapping between
>>>> DataType and Python object type, etc.
>>>>
>>>> I think the root cause of such a difference with existing documents is
>>>> that, Python is the first non-JVM language we support in flink. This means
>>>> our previous method of sharing documents between Java and Scala may not be
>>>> suitable for Python. So we will adopt some very different methods to
>>>> provide documentation for Python users. Of course, we should reduce
>>>> maintenance costs as much as possible while ensuring user experience.
>>>> Furthermore, python is the first step of flink multi-language support, and
>>>> there may be R, Go, etc in future. it is very necessary for us to form main
>>>> page for each language, so that users of each type of language can focus on
>>>> the content which they care about.
>>>>
>>>> >> Things like the cookbook and tutorial should be under the Try Flink
>>>> section of the documentation.
>>>>
>>>> Regarding the position of the "Cookbook" section, in my sense the "Try
>>>> Flink" is for the new users and the "Cookbook" is for more advanced users,
>>>> i.e., In “Try Flink” can be the simplest end-to-end example, such as “Hello
>>>> World” and In “Cookbook” we can add more use cases closer to production
>>>> business, Such as, CDN log analysis, PV / UV of e-commerce. So I prefer to
>>>> keep the current structure.
>>>>
>>>> >>  it's relatively straightforward to compare the Python API with the
>>>> Java and Scala versions.
>>>>
>>>> Regarding the comparison between Python API and Java/Scala API, I think
>>>> the majority of users, especially the beginner users, would not have this
>>>> demand. The priority of increasing user experience for beginner users seems
>>>> higher than it from my side. Would you please add more inputs for why user
>>>> want to compare? How much impact will the comparison be if we put it on
>>>> multiple pages :)
>>>>
>>>> Thanks for all of your feedback and suggestions, any follow-up feedback
>>>> is welcome.
>>>>
>>>> Best,
>>>> Jincheng
>>>>
>>>>
>>>> David Anderson <[hidden email]> 于2020年8月3日周一 下午10:49写道:
>>>>
>>>>> Jincheng,
>>>>>
>>>>> One thing that I like about the way that the documentation is
>>>>> currently organized is that it's relatively straightforward to compare the
>>>>> Python API with the Java and Scala versions. I'm concerned that if the
>>>>> PyFlink docs are more independent, it will be challenging to respond to
>>>>> questions about which features from the other APIs are available from
>>>>> Python.
>>>>>
>>>>> David
>>>>>
>>>>> On Mon, Aug 3, 2020 at 8:07 AM jincheng sun <[hidden email]>
>>>>> wrote:
>>>>>
>>>>>> Would be great if you could join the contribution of PyFlink
>>>>>> documentation @Marta !
>>>>>> Thanks for all of the positive feedback. I will start a formal vote
>>>>>> then
>>>>>> later...
>>>>>>
>>>>>> Best,
>>>>>> Jincheng
>>>>>>
>>>>>>
>>>>>> Shuiqiang Chen <[hidden email]> 于2020年8月3日周一 上午9:56写道:
>>>>>>
>>>>>> > Hi jincheng,
>>>>>> >
>>>>>> > Thanks for the discussion. +1 for the FLIP.
>>>>>> >
>>>>>> > A well-organized documentation will greatly improve the efficiency
>>>>>> and
>>>>>> > experience for developers.
>>>>>> >
>>>>>> > Best,
>>>>>> > Shuiqiang
>>>>>> >
>>>>>> > Hequn Cheng <[hidden email]> 于2020年8月1日周六 上午8:42写道:
>>>>>> >
>>>>>> >> Hi Jincheng,
>>>>>> >>
>>>>>> >> Thanks a lot for raising the discussion. +1 for the FLIP.
>>>>>> >>
>>>>>> >> I think this will bring big benefits for the PyFlink users.
>>>>>> Currently,
>>>>>> >> the Python TableAPI document is hidden deeply under the
>>>>>> TableAPI&SQL tab
>>>>>> >> which makes it quite unreadable. Also, the PyFlink documentation
>>>>>> is mixed
>>>>>> >> with Java/Scala documentation. It is hard for users to have an
>>>>>> overview of
>>>>>> >> all the PyFlink documents. As more and more functionalities are
>>>>>> added into
>>>>>> >> PyFlink, I think it's time for us to refactor the document.
>>>>>> >>
>>>>>> >> Best,
>>>>>> >> Hequn
>>>>>> >>
>>>>>> >>
>>>>>> >> On Fri, Jul 31, 2020 at 3:43 PM Marta Paes Moreira <
>>>>>> [hidden email]>
>>>>>> >> wrote:
>>>>>> >>
>>>>>> >>> Hi, Jincheng!
>>>>>> >>>
>>>>>> >>> Thanks for creating this detailed FLIP, it will make a big
>>>>>> difference in
>>>>>> >>> the experience of Python developers using Flink. I'm interested in
>>>>>> >>> contributing to this work, so I'll reach out to you offline!
>>>>>> >>>
>>>>>> >>> Also, thanks for sharing some information on the adoption of
>>>>>> PyFlink,
>>>>>> >>> it's
>>>>>> >>> great to see that there are already production users.
>>>>>> >>>
>>>>>> >>> Marta
>>>>>> >>>
>>>>>> >>> On Fri, Jul 31, 2020 at 5:35 AM Xingbo Huang <[hidden email]>
>>>>>> wrote:
>>>>>> >>>
>>>>>> >>> > Hi Jincheng,
>>>>>> >>> >
>>>>>> >>> > Thanks a lot for bringing up this discussion and the proposal.
>>>>>> >>> >
>>>>>> >>> > Big +1 for improving the structure of PyFlink doc.
>>>>>> >>> >
>>>>>> >>> > It will be very friendly to give PyFlink users a unified
>>>>>> entrance to
>>>>>> >>> learn
>>>>>> >>> > PyFlink documents.
>>>>>> >>> >
>>>>>> >>> > Best,
>>>>>> >>> > Xingbo
>>>>>> >>> >
>>>>>> >>> > Dian Fu <[hidden email]> 于2020年7月31日周五 上午11:00写道:
>>>>>> >>> >
>>>>>> >>> >> Hi Jincheng,
>>>>>> >>> >>
>>>>>> >>> >> Thanks a lot for bringing up this discussion and the proposal.
>>>>>> +1 to
>>>>>> >>> >> improve the Python API doc.
>>>>>> >>> >>
>>>>>> >>> >> I have received many feedbacks from PyFlink beginners about
>>>>>> >>> >> the PyFlink doc, e.g. the materials are too few, the Python
>>>>>> doc is
>>>>>> >>> mixed
>>>>>> >>> >> with the Java doc and it's not easy to find the docs he wants
>>>>>> to know.
>>>>>> >>> >>
>>>>>> >>> >> I think it would greatly improve the user experience if we can
>>>>>> have
>>>>>> >>> one
>>>>>> >>> >> place which includes most knowledges PyFlink users should know.
>>>>>> >>> >>
>>>>>> >>> >> Regards,
>>>>>> >>> >> Dian
>>>>>> >>> >>
>>>>>> >>> >> 在 2020年7月31日,上午10:14,jincheng sun <[hidden email]>
>>>>>> 写道:
>>>>>> >>> >>
>>>>>> >>> >> Hi folks,
>>>>>> >>> >>
>>>>>> >>> >> Since the release of Flink 1.11, users of PyFlink have
>>>>>> continued to
>>>>>> >>> grow.
>>>>>> >>> >> As far as I know there are many companies have used PyFlink
>>>>>> for data
>>>>>> >>> >> analysis, operation and maintenance monitoring business has
>>>>>> been put
>>>>>> >>> into
>>>>>> >>> >> production(Such as 聚美优品[1](Jumei),  浙江墨芷[2] (Mozhi) etc.).
>>>>>> According
>>>>>> >>> to
>>>>>> >>> >> the feedback we received, current documentation is not very
>>>>>> friendly
>>>>>> >>> to
>>>>>> >>> >> PyFlink users. There are two shortcomings:
>>>>>> >>> >>
>>>>>> >>> >> - Python related content is mixed in the Java/Scala
>>>>>> documentation,
>>>>>> >>> which
>>>>>> >>> >> makes it difficult for users who only focus on PyFlink to read.
>>>>>> >>> >> - There is already a "Python Table API" section in the Table
>>>>>> API
>>>>>> >>> document
>>>>>> >>> >> to store PyFlink documents, but the number of articles is
>>>>>> small and
>>>>>> >>> the
>>>>>> >>> >> content is fragmented. It is difficult for beginners to learn
>>>>>> from it.
>>>>>> >>> >>
>>>>>> >>> >> In addition, FLIP-130 introduced the Python DataStream API.
>>>>>> Many
>>>>>> >>> >> documents will be added for those new APIs. In order to
>>>>>> increase the
>>>>>> >>> >> readability and maintainability of the PyFlink document, Wei
>>>>>> Zhong
>>>>>> >>> and me
>>>>>> >>> >> have discussed offline and would like to rework it via this
>>>>>> FLIP.
>>>>>> >>> >>
>>>>>> >>> >> We will rework the document around the following three
>>>>>> objectives:
>>>>>> >>> >>
>>>>>> >>> >> - Add a separate section for Python API under the "Application
>>>>>> >>> >> Development" section.
>>>>>> >>> >> - Restructure current Python documentation to a brand new
>>>>>> structure to
>>>>>> >>> >> ensure complete content and friendly to beginners.
>>>>>> >>> >> - Improve the documents shared by Python/Java/Scala to make it
>>>>>> more
>>>>>> >>> >> friendly to Python users and without affecting Java/Scala
>>>>>> users.
>>>>>> >>> >>
>>>>>> >>> >> More detail can be found in the FLIP-133:
>>>>>> >>> >>
>>>>>> >>>
>>>>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-133%3A+Rework+PyFlink+Documentation
>>>>>> >>> >>
>>>>>> >>> >> Best,
>>>>>> >>> >> Jincheng
>>>>>> >>> >>
>>>>>> >>> >> [1] https://mp.weixin.qq.com/s/zVsBIs1ZEFe4atYUYtZpRg
>>>>>> >>> >> [2] https://mp.weixin.qq.com/s/R4p_a2TWGpESBWr3pLtM2g
>>>>>> >>> >>
>>>>>> >>> >>
>>>>>> >>> >>
>>>>>> >>>
>>>>>> >>
>>>>>>
>>>>>
>>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] FLIP-133: Rework PyFlink Documentation

jincheng sun
Thank you for your positive feedback Seth !
Would you please vote in the voting mail thread. Thank you!

Best,
Jincheng


Seth Wiesman <[hidden email]> 于2020年8月10日周一 下午10:34写道:

> I think this sounds good. +1
>
> On Wed, Aug 5, 2020 at 8:37 PM jincheng sun <[hidden email]>
> wrote:
>
>> Hi David, Thank you for sharing the problems with the current document,
>> and I agree with you as I also got the same feedback from Chinese users. I
>> am often contacted by users to ask questions such as whether PyFlink
>> supports "Java UDF" and whether PyFlink supports "xxxConnector". The root
>> cause of these problems is that our existing documents are based on Java
>> users (text and API mixed part). Since Python is newly added from 1.9, many
>> document information is not friendly to Python users. They don't want to
>> look for Python content in unfamiliar Java documents. Just yesterday, there
>> were complaints from Chinese users about where is all the document entries
>> of  Python API. So, have a centralized entry and clear document structure,
>> which is the urgent demand of Python users. The original intention of FLIP
>> is do our best to solve these user pain points.
>>
>> Hi Xingbo and Wei Thank you for sharing PySpark's status on document
>> optimization. You're right. PySpark already has a lot of Python user
>> groups. They also find that Python user community is an important position
>> for multilingual support. The centralization and unification of Python
>> document content will reduce the learning cost of Python users, and good
>> document structure and content will also reduce the Q & A burden of the
>> community, It's a once and for all job.
>>
>> Hi Seth, I wonder if your concerns have been resolved through the
>> previous discussion?
>>
>> Anyway, the principle of FLIP is that in python document should only
>> include Python specific content, instead of making a copy of the Java
>> content. And would be great to have you to join in the improvement for
>> PyFlink (Both PRs and Review PRs).
>>
>> Best,
>> Jincheng
>>
>>
>> Wei Zhong <[hidden email]> 于2020年8月5日周三 下午5:46写道:
>>
>>> Hi Xingbo,
>>>
>>> Thanks for your information.
>>>
>>> I think the PySpark's documentation redesigning deserves our attention.
>>> It seems that the Spark community has also begun to treat the user
>>> experience of Python documentation more seriously. We can continue to pay
>>> attention to the discussion and progress of the redesigning in the Spark
>>> community. It is so similar to our working that there should be some ideas
>>> worthy for us.
>>>
>>> Best,
>>> Wei
>>>
>>>
>>> 在 2020年8月5日,15:02,Xingbo Huang <[hidden email]> 写道:
>>>
>>> Hi,
>>>
>>> I found that the spark community is also working on redesigning pyspark
>>> documentation[1] recently. Maybe we can compare the difference between our
>>> document structure and its document structure.
>>>
>>> [1] https://issues.apache.org/jira/browse/SPARK-31851
>>>
>>> http://apache-spark-developers-list.1001551.n3.nabble.com/Need-some-help-and-contributions-in-PySpark-API-documentation-td29972.html
>>>
>>> Best,
>>> Xingbo
>>>
>>> David Anderson <[hidden email]> 于2020年8月5日周三 上午3:17写道:
>>>
>>>> I'm delighted to see energy going into improving the documentation.
>>>>
>>>> With the current documentation, I get a lot of questions that I believe
>>>> reflect two fundamental problems with what we currently provide:
>>>>
>>>> (1) We have a lot of contextual information in our heads about how
>>>> Flink works, and we are able to use that knowledge to make reasonable
>>>> inferences about how things (probably) work in cases we aren't so familiar
>>>> with. For example, I get a lot of questions of the form "If I use <this
>>>> feature> will I still have exactly once guarantees?" The answer is always
>>>> yes, but they continue to have doubts because we have failed to clearly
>>>> communicate this fundamental, underlying principle.
>>>>
>>>> This specific example about fault tolerance applies across all of the
>>>> Flink docs, but the general idea can also be applied to the Table/SQL and
>>>> PyFlink docs. The guiding principles underlying these APIs should be
>>>> written down in one easy-to-find place.
>>>>
>>>> (2) The other kind of question I get a lot is "Can I do <X> with <Y>?"
>>>> E.g., "Can I use the JDBC table sink from PyFlink?" These questions can be
>>>> very difficult to answer because it is frequently the case that one has to
>>>> reason about why a given feature doesn't seem to appear in the
>>>> documentation. It could be that I'm looking in the wrong place, or it could
>>>> be that someone forgot to document something, or it could be that it can in
>>>> fact be done by applying a general mechanism in a specific way that I
>>>> haven't thought of -- as in this case, where one can use a JDBC sink from
>>>> Python if one thinks to use DDL.
>>>>
>>>> So I think it would be helpful to be explicit about both what is, and
>>>> what is not, supported in PyFlink. And to have some very clear organizing
>>>> principles in the documentation so that users can quickly learn where to
>>>> look for specific facts.
>>>>
>>>> Regards,
>>>> David
>>>>
>>>>
>>>> On Tue, Aug 4, 2020 at 1:01 PM jincheng sun <[hidden email]>
>>>> wrote:
>>>>
>>>>> Hi Seth and David,
>>>>>
>>>>> I'm very happy to have your reply and suggestions. I would like to
>>>>> share my thoughts here:
>>>>>
>>>>> The main motivation we want to refactor the PyFlink doc is that we
>>>>> want to make sure that the Python users could find all they want starting
>>>>> from the PyFlink documentation mainpage. That’s, the PyFlink documentation
>>>>> should have a catalogue which includes all the functionalities available in
>>>>> PyFlink. However, this doesn’t mean that we will make a copy of the content
>>>>> of the documentation in the other places. It may be just a reference/link
>>>>> to the other documentation if needed. For the documentation added under
>>>>> PyFlink mainpage, the principle is that it should only include Python
>>>>> specific content, instead of making a copy of the Java content.
>>>>>
>>>>> >>  I'm concerned that this proposal duplicates a lot of content that
>>>>> will quickly get out of sync. It feels like it is documenting PyFlink
>>>>> separately from the rest of the project.
>>>>>
>>>>> Regarding the concerns about maintainability, as mentioned above, The
>>>>> goal of this FLIP is to provide an intelligible entrance of Python API, and
>>>>> the content in it should only contain the information which is useful for
>>>>> Python users. There are indeed many agenda items that duplicate the Java
>>>>> documents in this FLIP, but it doesn't mean the content would be copied
>>>>> from Java documentation. i.e, if the content of the document is the same as
>>>>> the corresponding Java document, we will add a link to the Java document.
>>>>> e.g. the "Built-in functions" and "SQL". We only create a page for the
>>>>> Python-only content, and then redirect to the Java document if there is
>>>>> something shared with Java. e.g. "Connectors" and "Catalogs". If the
>>>>> document is Python-only and already exists, we will move it from the old
>>>>> python document to the new python document, e.g. "Configurations". If the
>>>>> document is Python-only and not exists before, we will create a new page
>>>>> for it. e.g. "DataTypes".
>>>>>
>>>>> The main reason we create a new page for Python Data Types is that it
>>>>> is only conceptually one-to-one correspondence with Java Data Types, but
>>>>> the actual document content would be very different from Java DataTypes.
>>>>> Some detailed difference are as following:
>>>>>
>>>>>
>>>>>   - The text in the Java Data Types document is written for JVM-based
>>>>> language users, which is incomprehensible to users who only understand
>>>>> python.
>>>>>   - Currently the Python Data Types does not support the "bridgedTo"
>>>>> method, DataTypes.RAW, DataTypes.NULL and User Defined Types.
>>>>>   - The section "Planner Compatibility" and "Data Type Extraction" are
>>>>> only useful for Java/Scala users.
>>>>>   - We want to add sections which may only apply for Python such as
>>>>> which Data Types are currently supported in Python, the mapping between
>>>>> DataType and Python object type, etc.
>>>>>
>>>>> I think the root cause of such a difference with existing documents is
>>>>> that, Python is the first non-JVM language we support in flink. This means
>>>>> our previous method of sharing documents between Java and Scala may not be
>>>>> suitable for Python. So we will adopt some very different methods to
>>>>> provide documentation for Python users. Of course, we should reduce
>>>>> maintenance costs as much as possible while ensuring user experience.
>>>>> Furthermore, python is the first step of flink multi-language support, and
>>>>> there may be R, Go, etc in future. it is very necessary for us to form main
>>>>> page for each language, so that users of each type of language can focus on
>>>>> the content which they care about.
>>>>>
>>>>> >> Things like the cookbook and tutorial should be under the Try Flink
>>>>> section of the documentation.
>>>>>
>>>>> Regarding the position of the "Cookbook" section, in my sense the "Try
>>>>> Flink" is for the new users and the "Cookbook" is for more advanced users,
>>>>> i.e., In “Try Flink” can be the simplest end-to-end example, such as “Hello
>>>>> World” and In “Cookbook” we can add more use cases closer to production
>>>>> business, Such as, CDN log analysis, PV / UV of e-commerce. So I prefer to
>>>>> keep the current structure.
>>>>>
>>>>> >>  it's relatively straightforward to compare the Python API with the
>>>>> Java and Scala versions.
>>>>>
>>>>> Regarding the comparison between Python API and Java/Scala API, I
>>>>> think the majority of users, especially the beginner users, would not have
>>>>> this demand. The priority of increasing user experience for beginner users
>>>>> seems higher than it from my side. Would you please add more inputs for why
>>>>> user want to compare? How much impact will the comparison be if we put it
>>>>> on multiple pages :)
>>>>>
>>>>> Thanks for all of your feedback and suggestions, any follow-up
>>>>> feedback is welcome.
>>>>>
>>>>> Best,
>>>>> Jincheng
>>>>>
>>>>>
>>>>> David Anderson <[hidden email]> 于2020年8月3日周一 下午10:49写道:
>>>>>
>>>>>> Jincheng,
>>>>>>
>>>>>> One thing that I like about the way that the documentation is
>>>>>> currently organized is that it's relatively straightforward to compare the
>>>>>> Python API with the Java and Scala versions. I'm concerned that if the
>>>>>> PyFlink docs are more independent, it will be challenging to respond to
>>>>>> questions about which features from the other APIs are available from
>>>>>> Python.
>>>>>>
>>>>>> David
>>>>>>
>>>>>> On Mon, Aug 3, 2020 at 8:07 AM jincheng sun <[hidden email]>
>>>>>> wrote:
>>>>>>
>>>>>>> Would be great if you could join the contribution of PyFlink
>>>>>>> documentation @Marta !
>>>>>>> Thanks for all of the positive feedback. I will start a formal vote
>>>>>>> then
>>>>>>> later...
>>>>>>>
>>>>>>> Best,
>>>>>>> Jincheng
>>>>>>>
>>>>>>>
>>>>>>> Shuiqiang Chen <[hidden email]> 于2020年8月3日周一 上午9:56写道:
>>>>>>>
>>>>>>> > Hi jincheng,
>>>>>>> >
>>>>>>> > Thanks for the discussion. +1 for the FLIP.
>>>>>>> >
>>>>>>> > A well-organized documentation will greatly improve the efficiency
>>>>>>> and
>>>>>>> > experience for developers.
>>>>>>> >
>>>>>>> > Best,
>>>>>>> > Shuiqiang
>>>>>>> >
>>>>>>> > Hequn Cheng <[hidden email]> 于2020年8月1日周六 上午8:42写道:
>>>>>>> >
>>>>>>> >> Hi Jincheng,
>>>>>>> >>
>>>>>>> >> Thanks a lot for raising the discussion. +1 for the FLIP.
>>>>>>> >>
>>>>>>> >> I think this will bring big benefits for the PyFlink users.
>>>>>>> Currently,
>>>>>>> >> the Python TableAPI document is hidden deeply under the
>>>>>>> TableAPI&SQL tab
>>>>>>> >> which makes it quite unreadable. Also, the PyFlink documentation
>>>>>>> is mixed
>>>>>>> >> with Java/Scala documentation. It is hard for users to have an
>>>>>>> overview of
>>>>>>> >> all the PyFlink documents. As more and more functionalities are
>>>>>>> added into
>>>>>>> >> PyFlink, I think it's time for us to refactor the document.
>>>>>>> >>
>>>>>>> >> Best,
>>>>>>> >> Hequn
>>>>>>> >>
>>>>>>> >>
>>>>>>> >> On Fri, Jul 31, 2020 at 3:43 PM Marta Paes Moreira <
>>>>>>> [hidden email]>
>>>>>>> >> wrote:
>>>>>>> >>
>>>>>>> >>> Hi, Jincheng!
>>>>>>> >>>
>>>>>>> >>> Thanks for creating this detailed FLIP, it will make a big
>>>>>>> difference in
>>>>>>> >>> the experience of Python developers using Flink. I'm interested
>>>>>>> in
>>>>>>> >>> contributing to this work, so I'll reach out to you offline!
>>>>>>> >>>
>>>>>>> >>> Also, thanks for sharing some information on the adoption of
>>>>>>> PyFlink,
>>>>>>> >>> it's
>>>>>>> >>> great to see that there are already production users.
>>>>>>> >>>
>>>>>>> >>> Marta
>>>>>>> >>>
>>>>>>> >>> On Fri, Jul 31, 2020 at 5:35 AM Xingbo Huang <[hidden email]>
>>>>>>> wrote:
>>>>>>> >>>
>>>>>>> >>> > Hi Jincheng,
>>>>>>> >>> >
>>>>>>> >>> > Thanks a lot for bringing up this discussion and the proposal.
>>>>>>> >>> >
>>>>>>> >>> > Big +1 for improving the structure of PyFlink doc.
>>>>>>> >>> >
>>>>>>> >>> > It will be very friendly to give PyFlink users a unified
>>>>>>> entrance to
>>>>>>> >>> learn
>>>>>>> >>> > PyFlink documents.
>>>>>>> >>> >
>>>>>>> >>> > Best,
>>>>>>> >>> > Xingbo
>>>>>>> >>> >
>>>>>>> >>> > Dian Fu <[hidden email]> 于2020年7月31日周五 上午11:00写道:
>>>>>>> >>> >
>>>>>>> >>> >> Hi Jincheng,
>>>>>>> >>> >>
>>>>>>> >>> >> Thanks a lot for bringing up this discussion and the
>>>>>>> proposal. +1 to
>>>>>>> >>> >> improve the Python API doc.
>>>>>>> >>> >>
>>>>>>> >>> >> I have received many feedbacks from PyFlink beginners about
>>>>>>> >>> >> the PyFlink doc, e.g. the materials are too few, the Python
>>>>>>> doc is
>>>>>>> >>> mixed
>>>>>>> >>> >> with the Java doc and it's not easy to find the docs he wants
>>>>>>> to know.
>>>>>>> >>> >>
>>>>>>> >>> >> I think it would greatly improve the user experience if we
>>>>>>> can have
>>>>>>> >>> one
>>>>>>> >>> >> place which includes most knowledges PyFlink users should
>>>>>>> know.
>>>>>>> >>> >>
>>>>>>> >>> >> Regards,
>>>>>>> >>> >> Dian
>>>>>>> >>> >>
>>>>>>> >>> >> 在 2020年7月31日,上午10:14,jincheng sun <[hidden email]>
>>>>>>> 写道:
>>>>>>> >>> >>
>>>>>>> >>> >> Hi folks,
>>>>>>> >>> >>
>>>>>>> >>> >> Since the release of Flink 1.11, users of PyFlink have
>>>>>>> continued to
>>>>>>> >>> grow.
>>>>>>> >>> >> As far as I know there are many companies have used PyFlink
>>>>>>> for data
>>>>>>> >>> >> analysis, operation and maintenance monitoring business has
>>>>>>> been put
>>>>>>> >>> into
>>>>>>> >>> >> production(Such as 聚美优品[1](Jumei),  浙江墨芷[2] (Mozhi) etc.).
>>>>>>> According
>>>>>>> >>> to
>>>>>>> >>> >> the feedback we received, current documentation is not very
>>>>>>> friendly
>>>>>>> >>> to
>>>>>>> >>> >> PyFlink users. There are two shortcomings:
>>>>>>> >>> >>
>>>>>>> >>> >> - Python related content is mixed in the Java/Scala
>>>>>>> documentation,
>>>>>>> >>> which
>>>>>>> >>> >> makes it difficult for users who only focus on PyFlink to
>>>>>>> read.
>>>>>>> >>> >> - There is already a "Python Table API" section in the Table
>>>>>>> API
>>>>>>> >>> document
>>>>>>> >>> >> to store PyFlink documents, but the number of articles is
>>>>>>> small and
>>>>>>> >>> the
>>>>>>> >>> >> content is fragmented. It is difficult for beginners to learn
>>>>>>> from it.
>>>>>>> >>> >>
>>>>>>> >>> >> In addition, FLIP-130 introduced the Python DataStream API.
>>>>>>> Many
>>>>>>> >>> >> documents will be added for those new APIs. In order to
>>>>>>> increase the
>>>>>>> >>> >> readability and maintainability of the PyFlink document, Wei
>>>>>>> Zhong
>>>>>>> >>> and me
>>>>>>> >>> >> have discussed offline and would like to rework it via this
>>>>>>> FLIP.
>>>>>>> >>> >>
>>>>>>> >>> >> We will rework the document around the following three
>>>>>>> objectives:
>>>>>>> >>> >>
>>>>>>> >>> >> - Add a separate section for Python API under the "Application
>>>>>>> >>> >> Development" section.
>>>>>>> >>> >> - Restructure current Python documentation to a brand new
>>>>>>> structure to
>>>>>>> >>> >> ensure complete content and friendly to beginners.
>>>>>>> >>> >> - Improve the documents shared by Python/Java/Scala to make
>>>>>>> it more
>>>>>>> >>> >> friendly to Python users and without affecting Java/Scala
>>>>>>> users.
>>>>>>> >>> >>
>>>>>>> >>> >> More detail can be found in the FLIP-133:
>>>>>>> >>> >>
>>>>>>> >>>
>>>>>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-133%3A+Rework+PyFlink+Documentation
>>>>>>> >>> >>
>>>>>>> >>> >> Best,
>>>>>>> >>> >> Jincheng
>>>>>>> >>> >>
>>>>>>> >>> >> [1] https://mp.weixin.qq.com/s/zVsBIs1ZEFe4atYUYtZpRg
>>>>>>> >>> >> [2] https://mp.weixin.qq.com/s/R4p_a2TWGpESBWr3pLtM2g
>>>>>>> >>> >>
>>>>>>> >>> >>
>>>>>>> >>> >>
>>>>>>> >>>
>>>>>>> >>
>>>>>>>
>>>>>>
>>>