Introducing HTQL (Hyper Text Query Language) - Seeking Feedback, maybe contributors
Posted by docaicdev@reddit | programming | View on Reddit | 15 comments
justwakemein2020@reddit
I don't quite see the full rationale here.
You're leveraging SQL which is a syntax for querying tabular data, in the hopes that it will aid in extracting data from a document structure like the DOM?
Why would being compatible with implementation agnostic SQL adapters even matter? How often are people using client-side SQL data sources? And even in those cases, don't they come with their own native purpose-built apis anyways?
CodeAndBiscuits@reddit
To this point, that HTML is hierarchical, hierarchical querying is absolutely SQL's worst skill graph databases were invented partly for this reason. Trees of data make for huge joins and sometimes very odd subquery logic that can be hard to follow.
I wonder if OP would consider pivoting to what jq does for JSON data, which is much more similar in structure.
HolyPommeDeTerre@reddit
Yeah a graph ql (cypher for example) would match the structure better. Also the markup language is highly hierarchical so it greatly simplifies the possibilities of the graph QL (removes circular refs, removes multiple direct parents for a node...)
behind-UDFj-39546284@reddit
Please, no query language instead regexps. Use a right tool wherever it fits.
Cold_Meson_06@reddit
What is the output of a query when you select a span or *?
Looks good as a toy, but if I was looking for something more powerful than document.querySelector or XPath, I would expect it to be able to be something like graphql or the syntax tools like SASS/SCSS uses to select nodes + some syntax sugar on top for operators like ~= and friends, kinda like those DSLs that compile to regex strings.
At first, for me, it just looks like SQL shoehorned into selecting data from a tree structure, but maybe I just didn't see the more complex examples. Can you give one where a tool like this would be really good and the alternatives would be more verbose or hard to maintain?
Also, can you explain what you mean by "easy to use with other SQL adapters"? Idk what it means but I'm also unfamiliar with DB terminology.
badpotato@reddit
How about a converter between XPath and this?
hinckley@reddit
At first sight, from the examples, this feels a lot like a very verbose equivalent to query selector syntax. Could you explain what you're aiming to do that query selectors won't?
docaicdev@reddit (OP)
Query selectors are fine, but it’s essential to also have a programmatic way of extracting elements. Ideally, you’d implement this in a language like Python, TypeScript, or another suitable option to allow more complex data querying and logic, such as using OR/AND operations. My idea is to use a powerful, proven query language like SQL for this purpose. SQL has been tested over decades, is widely known, and provides a standardized interface that works with many implementations, like JPA. This might be a step for the future, but it offers a strong foundation.
Additionally, I considered adding a future feature to introduce a
JOIN
-like expression. This would allow combining outputs from multiple remote or local documents.usrlibshare@reddit
It has, it is, and it does.
For relational data organised in tables.
Care to explain how that paradigm maps to a nested-elements based markup language?
gredr@reddit
SQL is powerful, but it's... weird. Even if you're used to it and can work well with it, objectively, it's sorta backwards. It's an artifact of the era where we thought that programming would be easier if it was more like English (see COBOL).
That being said, SQL was designed to query relational data. HTML is not relational data. Instead of building on a foundation designed for the type of structure you're querying, you built on a foundation designed for a wholly different structure (and one of questionable design to begin with).
propeller-90@reddit
I'm sceptical.
For example, how would you select the list items in the list after the heading with id "countries"? (In selector notation that'd be
#countries~ul li
I think)docaicdev@reddit (OP)
hm, guess something like: "SELECT ul FROM document WHERE attributes.id = 'countries'" and then access simply the child elements
NenAlienGeenKonijn@reddit
That would insinuate you select a ul element with id countries. What he wants is the li elements that are the children of the first ul element that comes after a header with id 'countries'.
A sql syntax seems like a funny idea at first, but is utterly inadequate for querying document structures. That's why xpath exists.
docaicdev@reddit (OP)
Definitely, interesting point 🤔 need to think about that example
docaicdev@reddit (OP)
Definitely, interesting point 🤔 need to think about that example