1148. 文章浏览 I
1148. 文章浏览 I
题目
Table: Views
+---------------+---------+ | Column Name | Type | +---------------+---------+ | article_id | int | | author_id | int | | viewer_id | int | | view_date | date | +---------------+---------+
There is no primary key (column with unique values) for this table, the table may have duplicate rows.
Each row of this table indicates that some viewer viewed an article (written by some author) on some date.
Note that equal author_id and viewer_id indicate the same person.
Write a solution to find all the authors that viewed at least one of their own articles.
Return the result table sorted by id
in ascending order.
The result format is in the following example.
Example 1:
Input:
Views table:
+------------+-----------+-----------+------------+ | article_id | author_id | viewer_id | view_date | +------------+-----------+-----------+------------+ | 1 | 3 | 5 | 2019-08-01 | | 1 | 3 | 6 | 2019-08-02 | | 2 | 7 | 7 | 2019-08-01 | | 2 | 7 | 6 | 2019-08-02 | | 4 | 7 | 1 | 2019-07-22 | | 3 | 4 | 4 | 2019-07-21 | | 3 | 4 | 4 | 2019-07-21 | +------------+-----------+-----------+------------+
Output:
+------+ | id | +------+ | 4 | | 7 | +------+
题目大意
Views
表:
+---------------+---------+ | Column Name | Type | +---------------+---------+ | article_id | int | | author_id | int | | viewer_id | int | | view_date | date | +---------------+---------+
此表可能会存在重复行。(换句话说,在 SQL 中这个表没有主键)
此表的每一行都表示某人在某天浏览了某位作者的某篇文章。
请注意,同一人的 author_id 和 viewer_id 是相同的。
请查询出所有浏览过自己文章的作者
结果按照 id
升序排列。
查询结果的格式如下所示:
示例 1:
输入:
Views 表:
+------------+-----------+-----------+------------+ | article_id | author_id | viewer_id | view_date | +------------+-----------+-----------+------------+ | 1 | 3 | 5 | 2019-08-01 | | 1 | 3 | 6 | 2019-08-02 | | 2 | 7 | 7 | 2019-08-01 | | 2 | 7 | 6 | 2019-08-02 | | 4 | 7 | 1 | 2019-07-22 | | 3 | 4 | 4 | 2019-07-21 | | 3 | 4 | 4 | 2019-07-21 | +------------+-----------+-----------+------------+
输出:
+------+ | id | +------+ | 4 | | 7 | +------+
解题思路
MySQL
返回字段:
SELECT
指定要返回的字段:DISTINCT author_id AS id
,表示需要返回去重后的作者 ID。筛选条件:
author_id = viewer_id
:筛选出作者浏览自己文章的记录。- 使用
WHERE
子句将条件应用到Views
表中。
结果排序:
使用ORDER BY author_id ASC
按照作者 ID 升序排序。
复杂度分析
- 时间复杂度:
O(n)
,假设表中有n
条记录,需要遍历所有记录来评估条件。 - 空间复杂度:查询本身不占用额外空间,返回的结果占用的空间与满足条件的记录数成正比。
Pandas
加载数据:将
Views
表加载到 Pandas 的DataFrame
中。筛选条件:使用 Pandas 的布尔索引筛选出
author_id
等于viewer_id
的记录。去重操作:使用
.drop_duplicates()
方法获取去重后的author_id
。结果排序:使用
.sort_values(by='author_id')
对结果按升序排序。重命名:使用
.rename(columns={'author_id': 'id'})
对列进行重命名
代码
SELECT DISTINCT author_id AS id
FROM Views
WHERE author_id = viewer_id
ORDER BY author_id ASC;
import pandas as pd
def authors_who_viewed_their_articles(views: pd.DataFrame) -> pd.DataFrame:
# 筛选 author_id == viewer_id 的记录,并去重和排序
return views[views['author_id'] == views['viewer_id']][['author_id']].drop_duplicates().sort_values(by='author_id').rename(columns={'author_id': 'id'})