conceptoriented's comments

conceptoriented · on Oct 5, 2020

> Data never changes, but we have the possibility to create a new version of the data.

Well, it depends on what you mean by data. To avoid ambiguity it is better to talk about data values and data objects which have different properties. This can be formalized as follows [1]:

o data values are modelled via mathematical tuples – tuples are immutable

o data objects are modelled via mathematical functions (one field is a function from this reference to the field value) - functions are supposed to be mutable

(In reality of course we meet quite different situations, for example, struct is mutable and objects can be immutable.)

[1] Concept-oriented model: Modeling and processing data using functions https://www.researchgate.net/publication/337336089_Concept-o...

shwestrick · on Oct 5, 2020

What do you mean by "functions are supposed to be mutable"?

Perhaps you are just pointing out that the output of the function (and therefore the value of the field...?) will change as the input changes?

If mathematical tuples are immutable, then surely mathematical functions are immutable as well ;)

prostodata · on Oct 5, 2020

Here is one possible implementation of the concept-oriented model of data for data processing. It heavily relies on functions and operations with functions and is an alternative to purely set-oriented approaches like map-reduce or join-groupby (sql):

https://github.com/prostodata/prosto - Functions matter!

conceptoriented · on Oct 5, 2020

Function is a mapping between two sets (of values). This mapping between values is mutable although the values are not.

louthy · on Oct 5, 2020

Functions are a mapping between a domain and a codomain, the mapping absolutely isn’t mutable, the definition of the function is the relationship between the domains.

If I have a function:

    int Add1(int x) => x + 1

I would expect the domain and codomain to be immutable; I would also expect that x+1 to not turn in x/2 randomly also

conceptoriented · on Oct 5, 2020

> the mapping absolutely isn’t mutable

Assume f: X -> Y. We can now map x_1 to y_1 f(x_1)=y_1. And then change this same function by mapping x_1 to y_2: f(x_1)=y_2. Thus we can easily modify functions. Moreover, we do it constantly when we modify object fields in OOP. It is probably easier to comprehend if a function is represented as a table which we modify.

In contrast, we cannot modify data values (mathematical tuples). Say, x=42+1 means that a new value 43 is created rather than the existing value 42 is modified.

> I would expect the domain and codomain to be immutable;

No. Domains, codomains and any set can well be modified by adding or removing tuples. What is immutable are values (in the sets).

louthy · on Oct 5, 2020

> Assume f: X -> Y. We can now map x_1 to y_1 f(x_1)=y_1. And then change this same function by mapping x_1 to y_2: f(x_1)=y_2

They would be different functions, the first being the identity function: x => x, the second being: x => x + 1

> Thus we can easily modify functions. Moreover, we do it constantly when we modify object fields in OOP

This isn't the case. A field with a different value in it just means the object is a different value. If the object is passed to a static function, then the domain is the full set of possible values that the object can hold (this is known as a product-type, you multiply the total possible values of each of its component parts to find out the size of the domain).

If it's passed to a method then there's an additional implicit argument: `this`, which is the same as a static function with an additional argument that takes the object. The function is the same.

Global (or even free variables) should also be considered part of the domain: i.e. it's akin to implicit arguments that are being passed to the function.

> No. Domains, codomains and any set can well be modified by adding or removing tuples.

This also isn't the case. If a function is defined that takes an integer and returns a boolean value: Int → Bool then the domain is the set of integers, the co-domain is True and False. You can't pass a tuple to a function that takes an Int and therefore dynamically increase the size of the domain. Even in dynamic languages the codomain is effectively `top`, the type that holds all values, and therefore the domain is all values and the codomain is all values, which makes them immutable still.

Now maybe I am misunderstanding you, but this is how all of the mainstream statically and dynamically typed languages work. Perhaps there's some edge-case language that I'm missing here that allows types to be extended, which would be interesting in its own right.

kingdomcome50 · on Oct 5, 2020

Can you expand upon this? Perhaps the difference between "re-mapping" the function:

    f(x_1)=y_2

and "re-mapping" the value:

    x=42+2

How is the former different than the latter? And by what mechanism is the former achieved? I understand what you are saying, but how does one simply "change this same function"? Redefine it?

To be clear, I'm not suggesting you are incorrect. I just don't fully understand what you are getting at.

FridgeSeal · on Oct 6, 2020

Functions might be isomorphic to one-another, that doesn't make the function itself mutable.

conceptoriented · on Jan 30, 2020

I agree that data modeling is underestimated but it hardly can be considered a solved problem. It is very hard because there are numerous alternative understandings and formal defs of what we mean by data (RM, 00, OR, MD etc.) In additiin, there are several levels of representation (physical, logical, semantic). In real projects, they all are mixed.

conceptoriented · on Jan 30, 2020

The concept-oriented model tries to overcome some problems of RM by relying on two constructs: sets and functions. In contrast, RM uses only sets. The idea is that data can be stored in functions and transformed via operations with functions.

(0) https://www.researchgate.net/publication/337336089_Concept-o...

lincpa · on Jan 30, 2020

In `Everything is RMDB`, Data is dominant, It combines the advantages of RMDB and NoSQL. and it emphasizes the following points:

```

It’s better to have 100 functions operate on one data structure than 10 functions on 10 data structures.

        ---- Alan Perlis

```

conceptoriented · on July 5, 2018

> it is pretty much completely inappropriate for production since it uses no vector instructions nor GPU computation

Many platforms do not have GPUs or special instructions, for example, at the edge.

WoodenChair · on July 5, 2018

Sure, but we don't typically do sophisticated machine learning on them. The vast majority of modern CPUs have vector instructions. Even a Raspberry Pi's ARM has NEON.

fla · on July 6, 2018

Not necessarily true. You might want to run ANNs on micro-controllers at the edge for many different reasons.

Qu3tzal · on July 6, 2018

Excuse me if this sounds stupid but, vector instructions are assembly. I know we can use inline assembly or compile and link assembly alongside C but isn't it the compiler that is in charge of using vector instructions ?

IIRC GCC has -mmx, -sse(2|3|4) options to enable this kind of instructions.

WoodenChair · on July 6, 2018

Sure, if the compiler can find optimizations by inserting vector instructions, it will. But, typically you'll want to specifically format your code using matrices/a library like BLAS to maximize performance and use as many vector instructions as possible.

conceptoriented · on Oct 26, 2017

To test a machine learning algorithm, one can use "Fashion-MNIST": https://github.com/zalandoresearch/fashion-mnist

"Fashion-MNIST is a dataset of Zalando's article images—consisting of a training set of 60,000 examples and a test set of 10,000 examples. Each example is a 28x28 grayscale image, associated with a label from 10 classes. We intend Fashion-MNIST to serve as a direct drop-in replacement for the original MNIST dataset for benchmarking machine learning algorithms. It shares the same image size and structure of training and testing splits."

conceptoriented · on Oct 24, 2017

> Coda is a next-generation spreadsheet designed to make Excel a thing of the past

There are many projects aimed at making Excel "a thing of the past" but they focus on different needs:

o https://airtable.com - use tables for data organization (organize anything)

o https://fieldbook.com - spreadsheet as a database

o https://www.rowshare.com share data (share rows, not data)

o http://conceptoriented.com - data transformations and data wrangling (I am the author)

o Coda - integration with documents

It is interesting if one of these approaches will eventually dominate or we will have a zoo of spreadsheet-like applications.

ryanmarsh · on Oct 24, 2017

There are many projects aimed at making Excel "a thing of the past" but they focus on different needs:

I think this is why Excel/Google Sheets still dominate and will continue to.

All of these products do one or a few things things that Excel or Google Sheet do, but perhaps they make it little easier for a novice. I think what people don't understand about Excel (and to a lesser extent Google Sheets) is that it's an IDE. A novice can build interesting things, a power user can build incredible things.

The only way to make Excel a thing of the past would be to make a blow-away awesome replacement that does everything excel does but better. There's plenty of blue ocean around Excel and I think each of the products you listed could do just fine.

I would love all of Excel's power available to me but delivered like Google Sheets. That would definitely kill Excel. So far neither Microsoft nor Google seem really committed to this. Google Sheets is nice but just grabs the low hanging spreadsheet fruit. Office 365 is anemic.

If I had the time and funding I'd love to make a true Excel killer that was a faithful recreation of ALL of Excel's capabilities but delivered in a modern way. I'd pay good money for this. I believe many would. Excel may be a dinosaur, but it's still the apex predator.

martin_ky · on Oct 24, 2017

I am genuinely curious to learn what do you mean by "delivered in a modern way"? Which parts of Excel are obsolete and could be modernized?

mschaef · on Oct 24, 2017

I can't speak for ryanmarsh, but I have a few thoughts.

* Excel really has two major ways of performing computations on data - formulas within the grid and actions upon the grid. Despite the utility of the formula based dataflow model, there are too many operations that have to be performed as one-shot operations via commands (or scripted via VBA). Having formula based approaches for sorting, dividing into bins, etc. would be very useful.

* It'd be nice if Excel cells could contain values other than scalars. (Arrays, tuples, lists, maps, matrices, complex numbers, etc.)

* VBA can be used to define custom functions, but there's a lot of marshalling overhead going to VBA and the programming model is slightly different. It'd be nice to fix both of those issues.

* There's no way to locally bind names within a cell formula, so often subexpressions have to be duplicated. (And I believe they're doubly evaluated too.)

zgao · on Oct 24, 2017

Shameless plug: I'm a founder of Alphasheets, a company seeking to solve problems like these! I couldn't resist replying after seeing these comments.

We make a collaborative (Google Sheets style) spreadsheet with Python and R running in the sheet. You can define functions, plot using ggplot, embed numpy dataframes, matrices and all that good stuff. We don't let people use macros, all the code runs in cells because we think macros are too brittle. You can check out the website at http://alphasheets.com .

We're seeing that many enterprises (for example, in finance) that have Excel power users are moving to Python because of limitations like these, and are running into adoption issues because people like spreadsheets so much. That's generally where we come in and provide a bridge from the Excel world to Python through a more friendly frontend.

We're also seeing that Alphasheets can help a lot with shortening feedback cycles on more sophisticated data analyses- Excel is the most popular self-serve analytics tool out there, but doesn't cover cases where you need Python/R/fresh data.

ryanmarsh · on Oct 24, 2017

This is very nice. Problem is, there are sooo many more features in Excel you'll have to copy to get me to move. If you ask "which ones" I'll say "all of them". I'm a power user. I build huge dashboards and analytical tools in Excel. The thing I hate most is that all my work goes into a file that I have to pray works on the other persons computer.

kamaal · on Oct 25, 2017

The product is great. But you guys will need to launch a fully feature rich desktop client, which can sync with the cloud.

Else its the same thing mentioned in the previous comments. You would build a web app with 5% the features of excel, and the moment somebody reaches use case that can't be solved with your tool, they will have to switch to excel. If they have switch every second time they use your product. They might as well do all their work in Excel to begin with.

You have to be feature compliant with excel and you can't do that on a web app alone.

mschaef · on Oct 24, 2017

That's awesome. Some of the details are different, but pretty much exactly what I was thinking when I wrote what I wrote.

ryanmarsh · on Oct 24, 2017

It would be nice to:

* see a modern replacement for VBA, dare I say using JS

* be able to share a document that won't break when someone opens it on their computer (even if its using all the excel bells and whistles including external data sources and plugins). Google Sheets by contrast, is just a link.

* be able to use all the amazing features via the web and/or an app

pjmlp · on Oct 25, 2017

> * see a modern replacement for VBA, dare I say using JS

Any .NET language is already modern enough and supported via Add-ins.

I just miss not having them fully integrated into Office, instead of requiring VS + VSTO.

ryanmarsh · on Oct 24, 2017

Let's call Google Sheets "modern" because it can be used from an app or any web browser. I can share a Google Sheet much easier than an Excel file using all the bells and whistles.

The problem is, Excel has a ton of very powerful features. Many of which Google Sheets doesn't provide. Something like VBA would be nice. I'm aware you can write JS plugins for Google Sheets but the experience is no where near as good. Pivot tables in Excel still smoke Google Sheets.

rhizome · on Oct 24, 2017

There's probably a case to be made for a simplified spreadsheet tool in a Basecamp/Trello mold.

ryanmarsh · on Oct 24, 2017

I don't want simple. I want all the complexity and complex features. Everyone has a simple spreadsheet these days.

rhizome · on Oct 25, 2017

Then I have good news for you: you can use an existing spreadsheet app or a relational database!

SubiculumCode · on Oct 25, 2017

Does GoogleSheets actually match Excel's feature list?

Aloha · on Oct 25, 2017

Not even close - I'd guess its about 20% of the features excel has.

SubiculumCode · on Oct 25, 2017

That is what I thought.

dunham · on Oct 24, 2017

Also https://zenkit.com/ (fairly similar to airtable)

Years ago there was DabbleDB - https://en.wikipedia.org/wiki/Dabble_DB And Simile Exhibit experimented with a dynamic UI for filtering / faceting tabular data - http://www.simile-widgets.org/exhibit/

The witheve.com stuff (and the underlying "differential dataflow") is also interesting as a model for derived data which updates itself. I'm keeping an eye on that project too.

As far as your site goes (I just took a brief glance), if you haven't seen it already, you might find some interesting ideas in the sieuferd project:

http://people.csail.mit.edu/ebakke/sieuferd/index.html

donarb · on Oct 24, 2017

Oh, yea. DabbleDB was amazing! Eventually bought by Twitter. Here's a demo video.

https://www.youtube.com/watch?v=MCVj5RZOqwY

Glench · on Oct 24, 2017

To throw my hat in the ring, I'm developing a prototype environment for data transformations I'm calling Flowsheets: https://www.youtube.com/watch?v=y1Ca5czOY7Q

Although my goal is to use it to understand how we can write software better.

westoncb · on Oct 25, 2017

Pretty awesome, bGlench! ;)

I wonder if a kind of hybrid programming would be possible which switches between this dataflow-like functionality for parts and more traditional ('large blocks of text'-based) techniques for other parts.

I was working on a simple framework a while ago where the highest-level organizational structure was a 'domain' and these domains would connect to one another via 'converters'. I think the dataflow format would work really well for defining and linking up domains, and small functions that do things like filtering would work well within converters—but then maybe within particular domains it's somewhat of a free-for-all again (i.e. you use traditional programming techniques). Just thought I'd share the idea on the off chance that it sparks something for ya—I'm not really doing anything with that project at the moment.

I'm also curious why you prefer the tabular format over something graph-based. Is it just that it's more straightforward for people to lay things out/organize?

MattRix · on Oct 26, 2017

Some neat ideas here! Also one of the most entertaining demo videos I've seen in a while :P

coverband · on Oct 24, 2017

I'm not currently a user but https://www.smartsheet.com/ had pretty advanced features that I liked. The UI was a bit old-fashioned but the tool is capable.

ksec · on Oct 25, 2017

Thx!. First time I heard of rowshare and it is exactly what i am looking for. I wish excel or every excel like apps has this .

hyuuu · on Oct 24, 2017

there is a typo in the first sentence of the welcome pop up

conceptoriented · on Oct 24, 2017

thanks!

conceptoriented · on Oct 22, 2017

From this (and many other) tutorial it is not clear if tensors in tensorflow are true mathematical tensors (that is, having covariant and contravariant indices) or they are multidimensional arrays. The name Tensorflow and terminology suggests that Tensorflow manipulates mathematical tensors, for example:

  Scalars -> Vectors -> Matrices -> Tensors (really tensors?)

but what you see are multidimensional arrays. It is of course not a big problem but probably could be clarified somewhere at least in small font to avoid ambiguity. Or Tensorflow objects are true tensors indeed?

dragandj · on Oct 22, 2017

Even if they were (I doubt), I haven't found a clear and informed description how they relate exactly to tensors found in math/physics literature. I agree with your view that they look more like nd-arrays.

zazen · on Oct 22, 2017

I was surprised when I first saw the word "tensor" being thrown around by computer scientists to apparently mean just multi-dimensional array. But then I thought, well, "vector" is very widely used - including by mathematicians - to mean simply an nx1 or 1xn array, rather than an object which transforms a certain way under coordinate tranformations. So in the same way, I suppose we really might as well use "tensor" to mean "just" a multi-dimensional array of numbers, in contexts where coordinate transformations aren't important. Mathematical physics can continue to use the other definition where necessary, just as it does for vectors.

dragandj · on Oct 22, 2017

The trouble with that approach is that in CS, tensors are mostly used in machine learning, which is very math-dependent. So, you read in a textbook or a paper that something can be done elegantly by using some linear algebra operation, or some transformation on a tensor, and are delighted, because your library says to be tensor-based, but, then, when you try to code it, whoops; you meant you had tensor support, but all you've got is a multidimensional array memory layout...

conceptoriented · on Oct 21, 2017

The lifetime word limit is a too rigid constraint. There should be a possibility to acquire (or lose) the points you can later use for publishing your results. Then the "price" of publishing negative results could be lower than other results.

conceptoriented · on Oct 13, 2017

HPAT looks pretty promising. I wonder how they managed to signficantly increase performance of shuffling and sorting which are known to be quite difficult operations in map-reduce.